(iceberg-python) branch main updated: docs: Add `sqlcatalog` and local fs warehouse (#361)

fokko Sun, 04 Feb 2024 13:30:10 -0800

This is an automated email from the ASF dual-hosted git repository.

fokko pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/iceberg-python.git



The following commit(s) were added to refs/heads/main by this push:
     new fa15877  docs: Add `sqlcatalog` and local fs warehouse (#361)
fa15877 is described below

commit fa1587788dbbb73b8a66a0a59ff7a8083b764707
Author: Kevin Liu <[email protected]>
AuthorDate: Sun Feb 4 13:29:36 2024 -0800

    docs: Add `sqlcatalog` and local fs warehouse (#361)
    
    * add sqlcatalog and local fs warehouse
    
    * make lint
    
    * Apply suggestions from code review
    
    Co-authored-by: Fokko Driesprong <[email protected]>
    
    ---------
    
    Co-authored-by: Fokko Driesprong <[email protected]>
---
 mkdocs/docs/index.md | 35 ++++++++++++++++++++++++++++++++---
 1 file changed, 32 insertions(+), 3 deletions(-)

diff --git a/mkdocs/docs/index.md b/mkdocs/docs/index.md
index d82aa65..f1c1fa9 100644
--- a/mkdocs/docs/index.md
+++ b/mkdocs/docs/index.md
@@ -62,6 +62,29 @@ You either need to install `s3fs`, `adlfs`, `gcs`, or 
`pyarrow` to be able to fe
 
 Iceberg leverages the [catalog to have one centralized place to organize the 
tables](https://iceberg.apache.org/catalog/). This can be a traditional Hive 
catalog to store your Iceberg tables next to the rest, a vendor solution like 
the AWS Glue catalog, or an implementation of Icebergs' own [REST 
protocol](https://github.com/apache/iceberg/tree/main/open-api). Checkout the 
[configuration](configuration.md) page to find all the configuration details.
 
+For the sake of demonstration, we'll configure the catalog to use the 
`SqlCatalog` implementation, which will store information in a local `sqlite` 
database. We'll also configure the catalog to store data files in the local 
filesystem instead of an object store. This should not be used in production 
due to the limited scalability.
+
+Create a temporary location for Iceberg:
+
+```shell
+mkdir /tmp/warehouse
+```
+
+Open a Python 3 REPL to set up the catalog:
+
+```python
+from pyiceberg.catalog.sql import SqlCatalog
+
+warehouse_path = "/tmp/warehouse"
+catalog = SqlCatalog(
+    "default",
+    **{
+        "uri": f"sqlite:///{warehouse_path}/pyiceberg_catalog.db",
+        "warehouse": f"file://{warehouse_path}",
+    },
+)
+```
+
 ## Write a PyArrow dataframe
 
 Let's take the Taxi dataset, and write this to an Iceberg table.
@@ -83,9 +106,7 @@ df = pq.read_table("/tmp/yellow_tripdata_2023-01.parquet")
 Create a new Iceberg table:
 
 ```python
-from pyiceberg.catalog import load_catalog
-
-catalog = load_catalog("default")
+catalog.create_namespace("default")
 
 table = catalog.create_table(
     "default.taxi_dataset",
@@ -158,6 +179,14 @@ df = table.scan(row_filter="tip_per_mile > 0").to_arrow()
 len(df)
 ```
 
+### Explore Iceberg data and metadata files
+
+Since the catalog was configured to use the local filesystem, we can explore 
how Iceberg saved data and metadata files from the above operations.
+
+```shell
+find /tmp/warehouse/
+```
+
 ## More details
 
 For the details, please check the [CLI](cli.md) or [Python API](api.md) page.

(iceberg-python) branch main updated: docs: Add `sqlcatalog` and local fs warehouse (#361)

Reply via email to