This is an automated email from the ASF dual-hosted git repository.
timsaucer pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/datafusion-python.git
The following commit(s) were added to refs/heads/main by this push:
new 4b215724 Add a working, more complete example of using a catalog
(docs) (#1427)
4b215724 is described below
commit 4b215724565cec4257ed9dfa25271c5481c9f7b4
Author: Topias Pyykkönen <[email protected]>
AuthorDate: Fri Mar 27 17:17:44 2026 +0200
Add a working, more complete example of using a catalog (docs) (#1427)
* Add a working, more complete example of using a catalog
* the default schema is 'public', not 'default'
* in-memory table instead of imaginary csv for standalone example
* typo fix
Co-authored-by: Kevin Liu <[email protected]>
* minor c string fix after merge
---------
Co-authored-by: Kevin Liu <[email protected]>
Co-authored-by: Tim Saucer <[email protected]>
---
crates/core/src/context.rs | 2 +-
docs/source/user-guide/data-sources.rst | 26 +++++++++++++++++++-------
2 files changed, 20 insertions(+), 8 deletions(-)
diff --git a/crates/core/src/context.rs b/crates/core/src/context.rs
index 4880b92b..53994d2f 100644
--- a/crates/core/src/context.rs
+++ b/crates/core/src/context.rs
@@ -196,7 +196,7 @@ impl PySessionConfig {
let capsule = capsule.cast::<PyCapsule>()?;
let extension: NonNull<FFI_ExtensionOptions> = capsule
- .pointer_checked(Some(c_str!("datafusion_extension_options")))?
+ .pointer_checked(Some(c"datafusion_extension_options"))?
.cast();
let mut extension = unsafe { extension.as_ref() }.clone();
diff --git a/docs/source/user-guide/data-sources.rst
b/docs/source/user-guide/data-sources.rst
index 26f1303c..48ff4c01 100644
--- a/docs/source/user-guide/data-sources.rst
+++ b/docs/source/user-guide/data-sources.rst
@@ -224,25 +224,37 @@ A common technique for organizing tables is using a three
level hierarchical app
supports this form of organizing using the
:py:class:`~datafusion.catalog.Catalog`,
:py:class:`~datafusion.catalog.Schema`, and
:py:class:`~datafusion.catalog.Table`. By default,
a :py:class:`~datafusion.context.SessionContext` comes with a single Catalog
and a single Schema
-with the names ``datafusion`` and ``default``, respectively.
+with the names ``datafusion`` and ``public``, respectively.
The default implementation uses an in-memory approach to the catalog and
schema. We have support
-for adding additional in-memory catalogs and schemas. This can be done like in
the following
+for adding additional in-memory catalogs and schemas. You can access tables
registered in a schema
+either through the Dataframe API or via sql commands. This can be done like in
the following
example:
.. code-block:: python
+ import pyarrow as pa
from datafusion.catalog import Catalog, Schema
+ from datafusion import SessionContext
+
+ ctx = SessionContext()
my_catalog = Catalog.memory_catalog()
- my_schema = Schema.memory_schema()
+ my_schema = Schema.memory_schema()
+ my_catalog.register_schema('my_schema_name', my_schema)
+ ctx.register_catalog_provider('my_catalog_name', my_catalog)
- my_catalog.register_schema("my_schema_name", my_schema)
+ # Create an in-memory table
+ table = pa.table({
+ 'name': ['Bulbasaur', 'Charmander', 'Squirtle'],
+ 'type': ['Grass', 'Fire', 'Water'],
+ 'hp': [45, 39, 44],
+ })
+ df = ctx.create_dataframe([table.to_batches()], name='pokemon')
- ctx.register_catalog("my_catalog_name", my_catalog)
+ my_schema.register_table('pokemon', df)
-You could then register tables in ``my_schema`` and access them either through
the DataFrame
-API or via sql commands such as ``"SELECT * from
my_catalog_name.my_schema_name.my_table"``.
+ ctx.sql('SELECT * FROM my_catalog_name.my_schema_name.pokemon').show()
User Defined Catalog and Schema
-------------------------------
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]