Obviously, I prefer to have a Rust-powered Paimon-Python implementation. I truly believe it's the future our community should focus on and pursue.
However, I still think these two projects can collaborate by sharing the same Python API definition but with different implementations. For example, paimon-python can define APIs and accept paimon-impls that point to implementation packages called paimon-core-java and paimon-core-rust. This should be as simple as "import paimon-core-rust" in Python, provided we build classes that expose the same API. On Wed, Aug 7, 2024, at 21:35, Jingsong Li wrote: > Thanks Zelin for driving the paimon-python! > > +1 to it. > > In my opinion, a Python API definition is important, and we need to > provide a clear API definition. > > About implementation: > > - I think the first version can be implemented in py4j, wrapping Java > implementation and transmitting data through Arrow IPC. > - At the same time, we have a further optimized version, paimon-rust > version, which can gradually improve some features. > > What do you think? Zelin and Xuanwo > > Best, > Jingsong > > On Wed, Aug 7, 2024 at 7:44 PM Xuanwo <xua...@apache.org> wrote: >> >> Hi, yu zelin >> >> Thank you for initiating this discussion. >> >> I'm also working on this. My current plan is to build paimon-rust, followed >> by paimon-python via pyo3 by exposing the paimon-rust API. >> >> PyO3 can build a native Python package without additional dependencies. This >> way, users can install paimon-python simply by running pip install paimon, >> without needing any extra setup for Java, Paimon, Flink or other components. >> >> Are you interested in this direction? >> >> Some context: the iceberg community is also working use iceberg-rust in >> pyicberg directly: https://github.com/apache/iceberg-rust/pull/518 >> >> On Wed, Aug 7, 2024, at 19:24, yu zelin wrote: >> > Hi devs, >> > >> > I'd like to introduce a python SDK for paimon (paimon-python). Python users >> > can use it to access paimon data more easily. >> > >> > In the first version, I would leverage py4j to wrap Java SDK with python >> > codes. Briefly speaking, py4j can start a JVM and >> > load Java classes, so we can use it to access Paimon table Java API and get >> > results in Python code. An example is flink-python: >> > https://github.com/apache/flink/tree/master/flink-python >> > >> > I'd like to give an paimon example: >> > ``` >> > >> > class FileStoreTable(object): >> > >> > >> > @classmethod >> > >> > def create(cls, context: CatalogContext) -> 'FileStoreTable': >> > >> > *# gateway is built via py4j to access JVM* >> > >> > gateway = get_gateway() >> > >> > *# use gateway.jvm to access java classes* >> > >> > j_table = >> > gateway.jvm.FileStoreTableFactory.create(context.to_j_catalog_context()) >> > >> > return FileStoreTable(j_table) >> > >> > >> > def __init__(self, j_table): >> > >> > self.__j_table = j_table >> > >> > >> > # wrap Java method >> > >> > def primary_keys(self) -> List[str]: >> > >> > return self.__j_table.primaryKeys() >> > ``` >> > Then we can wrap scan, read interface to read table and write, commit >> > interface to write table via Python. >> > >> > Looking forward to your suggestions. >> > >> > Best Regards, >> > Zelin Yu >> >> -- >> Xuanwo >> >> https://xuanwo.io/ -- Xuanwo https://xuanwo.io/