Thanks Zelin for driving the paimon-python! +1 to it.
In my opinion, a Python API definition is important, and we need to provide a clear API definition. About implementation: - I think the first version can be implemented in py4j, wrapping Java implementation and transmitting data through Arrow IPC. - At the same time, we have a further optimized version, paimon-rust version, which can gradually improve some features. What do you think? Zelin and Xuanwo Best, Jingsong On Wed, Aug 7, 2024 at 7:44 PM Xuanwo <xua...@apache.org> wrote: > > Hi, yu zelin > > Thank you for initiating this discussion. > > I'm also working on this. My current plan is to build paimon-rust, followed > by paimon-python via pyo3 by exposing the paimon-rust API. > > PyO3 can build a native Python package without additional dependencies. This > way, users can install paimon-python simply by running pip install paimon, > without needing any extra setup for Java, Paimon, Flink or other components. > > Are you interested in this direction? > > Some context: the iceberg community is also working use iceberg-rust in > pyicberg directly: https://github.com/apache/iceberg-rust/pull/518 > > On Wed, Aug 7, 2024, at 19:24, yu zelin wrote: > > Hi devs, > > > > I'd like to introduce a python SDK for paimon (paimon-python). Python users > > can use it to access paimon data more easily. > > > > In the first version, I would leverage py4j to wrap Java SDK with python > > codes. Briefly speaking, py4j can start a JVM and > > load Java classes, so we can use it to access Paimon table Java API and get > > results in Python code. An example is flink-python: > > https://github.com/apache/flink/tree/master/flink-python > > > > I'd like to give an paimon example: > > ``` > > > > class FileStoreTable(object): > > > > > > @classmethod > > > > def create(cls, context: CatalogContext) -> 'FileStoreTable': > > > > *# gateway is built via py4j to access JVM* > > > > gateway = get_gateway() > > > > *# use gateway.jvm to access java classes* > > > > j_table = > > gateway.jvm.FileStoreTableFactory.create(context.to_j_catalog_context()) > > > > return FileStoreTable(j_table) > > > > > > def __init__(self, j_table): > > > > self.__j_table = j_table > > > > > > # wrap Java method > > > > def primary_keys(self) -> List[str]: > > > > return self.__j_table.primaryKeys() > > ``` > > Then we can wrap scan, read interface to read table and write, commit > > interface to write table via Python. > > > > Looking forward to your suggestions. > > > > Best Regards, > > Zelin Yu > > -- > Xuanwo > > https://xuanwo.io/