This is an automated email from the ASF dual-hosted git repository.
fokko pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/iceberg-python.git
The following commit(s) were added to refs/heads/main by this push:
new fa15877 docs: Add `sqlcatalog` and local fs warehouse (#361)
fa15877 is described below
commit fa1587788dbbb73b8a66a0a59ff7a8083b764707
Author: Kevin Liu <[email protected]>
AuthorDate: Sun Feb 4 13:29:36 2024 -0800
docs: Add `sqlcatalog` and local fs warehouse (#361)
* add sqlcatalog and local fs warehouse
* make lint
* Apply suggestions from code review
Co-authored-by: Fokko Driesprong <[email protected]>
---------
Co-authored-by: Fokko Driesprong <[email protected]>
---
mkdocs/docs/index.md | 35 ++++++++++++++++++++++++++++++++---
1 file changed, 32 insertions(+), 3 deletions(-)
diff --git a/mkdocs/docs/index.md b/mkdocs/docs/index.md
index d82aa65..f1c1fa9 100644
--- a/mkdocs/docs/index.md
+++ b/mkdocs/docs/index.md
@@ -62,6 +62,29 @@ You either need to install `s3fs`, `adlfs`, `gcs`, or
`pyarrow` to be able to fe
Iceberg leverages the [catalog to have one centralized place to organize the
tables](https://iceberg.apache.org/catalog/). This can be a traditional Hive
catalog to store your Iceberg tables next to the rest, a vendor solution like
the AWS Glue catalog, or an implementation of Icebergs' own [REST
protocol](https://github.com/apache/iceberg/tree/main/open-api). Checkout the
[configuration](configuration.md) page to find all the configuration details.
+For the sake of demonstration, we'll configure the catalog to use the
`SqlCatalog` implementation, which will store information in a local `sqlite`
database. We'll also configure the catalog to store data files in the local
filesystem instead of an object store. This should not be used in production
due to the limited scalability.
+
+Create a temporary location for Iceberg:
+
+```shell
+mkdir /tmp/warehouse
+```
+
+Open a Python 3 REPL to set up the catalog:
+
+```python
+from pyiceberg.catalog.sql import SqlCatalog
+
+warehouse_path = "/tmp/warehouse"
+catalog = SqlCatalog(
+ "default",
+ **{
+ "uri": f"sqlite:///{warehouse_path}/pyiceberg_catalog.db",
+ "warehouse": f"file://{warehouse_path}",
+ },
+)
+```
+
## Write a PyArrow dataframe
Let's take the Taxi dataset, and write this to an Iceberg table.
@@ -83,9 +106,7 @@ df = pq.read_table("/tmp/yellow_tripdata_2023-01.parquet")
Create a new Iceberg table:
```python
-from pyiceberg.catalog import load_catalog
-
-catalog = load_catalog("default")
+catalog.create_namespace("default")
table = catalog.create_table(
"default.taxi_dataset",
@@ -158,6 +179,14 @@ df = table.scan(row_filter="tip_per_mile > 0").to_arrow()
len(df)
```
+### Explore Iceberg data and metadata files
+
+Since the catalog was configured to use the local filesystem, we can explore
how Iceberg saved data and metadata files from the above operations.
+
+```shell
+find /tmp/warehouse/
+```
+
## More details
For the details, please check the [CLI](cli.md) or [Python API](api.md) page.