luoyuxia commented on code in PR #204: URL: https://github.com/apache/paimon-rust/pull/204#discussion_r3035477162
########## bindings/python/project-description.md: ########## @@ -0,0 +1,69 @@ +<!-- + ~ Licensed to the Apache Software Foundation (ASF) under one + ~ or more contributor license agreements. See the NOTICE file + ~ distributed with this work for additional information + ~ regarding copyright ownership. The ASF licenses this file + ~ to you under the Apache License, Version 2.0 (the + ~ "License"); you may not use this file except in compliance + ~ with the License. You may obtain a copy of the License at + ~ + ~ http://www.apache.org/licenses/LICENSE-2.0 + ~ + ~ Unless required by applicable law or agreed to in writing, + ~ software distributed under the License is distributed on an + ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + ~ KIND, either express or implied. See the License for the + ~ specific language governing permissions and limitations + ~ under the License. +--> + +# PyPaimon Core + +This project builds the Rust-powered core for [PyPaimon](https://paimon.apache.org/docs/master/pypaimon/overview/) while also providing DataFusion integration for querying Paimon tables. + +Install via PyPI: + +``` +pip install pypaimon-core +``` + +If you want to use the native Python DataFusion `SessionContext`, install `datafusion` as well. + +## Query Paimon Tables with DataFusion + +`pypaimon-core` provides a `PaimonCatalog` that can be registered into the native DataFusion `SessionContext`. +This keeps the standard DataFusion Python API available for regular queries. + +```python +from datafusion import SessionContext +from pypaimon_core.datafusion import PaimonCatalog + +catalog = PaimonCatalog({ + "warehouse": "/path/to/warehouse", +}) + +ctx = SessionContext() +ctx.register_catalog_provider("paimon", catalog) + +# Query tables via SQL (catalog.database.table) +df = ctx.sql("SELECT * FROM paimon.default.my_table LIMIT 10") +df.show() +``` + +### REST Catalog + +```python +from datafusion import SessionContext +from pypaimon_core.datafusion import PaimonCatalog + +catalog = PaimonCatalog({ + "metastore": "rest", + "uri": "http://localhost:8080", + "warehouse": "my_warehouse", +}) + +ctx = SessionContext() +ctx.register_catalog_provider("paimon", catalog) +``` + +Time travel queries are not supported in the Python binding at this time. Review Comment: **Why time travel is not supported in this PR** - This Python integration intentionally uses native DataFusion `SessionContext` plus `CatalogProvider`. - That path works well for normal table queries, but time travel requires planner-level extension rather than simple catalog/table registration. - DataFusion Python/FFI does not currently expose a suitable planner extension point for this, so we cannot cleanly wire time-travel semantics into native SessionContext here. **What would be needed to support time travel** - Either introduce a custom context that owns the planner extension, - or wait for / contribute planner-level registration support in DataFusion Python/FFI, - or expose time travel through a separate explicit API instead of native SQL/catalog resolution. **Why not use a custom context now** - The goal of this PR is to stay aligned with native DataFusion usage and keep the Python integration lightweight and predictable. - If we introduce a custom context and still want a user experience close to native SessionContext, we would need to re-expose a large part of the SessionContext API surface there, which is relatively heavy to implement and maintain. - It would also introduce another API model and make the overall Python experience less consistent with native DataFusion. - If time travel becomes a strong requirement, we can revisit it and design dedicated support separately. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
