This is an automated email from the ASF dual-hosted git repository. timsaucer pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/datafusion-python.git
The following commit(s) were added to refs/heads/main by this push: new b7d3519d docs: add apache iceberg as datafusion data source (#1240) b7d3519d is described below commit b7d3519d395025183a06aad268ee30e61f8226df Author: Kevin Liu <kevinjq...@users.noreply.github.com> AuthorDate: Tue Sep 16 14:32:35 2025 -0700 docs: add apache iceberg as datafusion data source (#1240) * add iceberg as data source * fix warning --- docs/source/user-guide/data-sources.rst | 37 ++++++++++++++++++++++++++++++--- 1 file changed, 34 insertions(+), 3 deletions(-) diff --git a/docs/source/user-guide/data-sources.rst b/docs/source/user-guide/data-sources.rst index 7d07c67d..a9b119b9 100644 --- a/docs/source/user-guide/data-sources.rst +++ b/docs/source/user-guide/data-sources.rst @@ -172,10 +172,41 @@ which can lead to a significant performance difference. df = ctx.table("my_delta_table") df.show() -Iceberg -------- +Apache Iceberg +-------------- -Coming soon! +DataFusion 45.0.0 and later support the ability to register Apache Iceberg tables as table providers through the Custom Table Provider interface. + +This requires either the `pyiceberg <https://pypi.org/project/pyiceberg/>`__ library (>=0.10.0) or the `pyiceberg-core <https://pypi.org/project/pyiceberg-core/>`__ library (>=0.5.0). + +* The ``pyiceberg-core`` library exposes Iceberg Rust's implementation of the Custom Table Provider interface as python bindings. +* The ``pyiceberg`` library utilizes the ``pyiceberg-core`` python bindings under the hood and provides a native way for Python users to interact with the DataFusion. + +.. code-block:: python + + from datafusion import SessionContext + from pyiceberg.catalog import load_catalog + import pyarrow as pa + + # Load catalog and create/load a table + catalog = load_catalog("catalog", type="in-memory") + catalog.create_namespace_if_not_exists("default") + + # Create some sample data + data = pa.table({"x": [1, 2, 3], "y": [4, 5, 6]}) + iceberg_table = catalog.create_table("default.test", schema=data.schema) + iceberg_table.append(data) + + # Register the table with DataFusion + ctx = SessionContext() + ctx.register_table_provider("test", iceberg_table) + + # Query the table using DataFusion + ctx.table("test").show() + + +Note that the Datafusion integration rely on features from the `Iceberg Rust <https://github.com/apache/iceberg-rust/>`_ implementation instead of the `PyIceberg <https://github.com/apache/iceberg-python/>`_ implementation. +Features that are available in PyIceberg but not yet in Iceberg Rust will not be available when using DataFusion. Custom Table Provider --------------------- --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org For additional commands, e-mail: commits-h...@datafusion.apache.org