Hi Jingsong and Xuanwo, Thanks for your suggestion! I agree that rust-based paimon-python is better, but in order to launch the python SDK sooner, I think it's necessary to introduce 'paimon-core-java' implementation. So I will draft an API definition and we can have further discussions on it.
On Thu, Aug 8, 2024 at 9:25 AM Kerwin <zhuang.ker...@gmail.com> wrote: > +1 > > Looking forward to the implementation of python. > > -- > Best Wish > — Kerwin (zhuangchong) > > > > Jingsong Li <jingsongl...@gmail.com> 于2024年8月8日周四 09:12写道: > > > Yes, we need a plugin mechanism. > > > > Something like Java SPI is good, implementation based on dependencies or > > import. > > > > > > Xuanwo <xua...@apache.org>于2024年8月7日 周三22:34写道: > > > > > Obviously, I prefer to have a Rust-powered Paimon-Python > implementation. > > I > > > truly believe it's the future our community should focus on and pursue. > > > > > > However, I still think these two projects can collaborate by sharing > the > > > same Python API definition but with different implementations. > > > > > > For example, paimon-python can define APIs and accept paimon-impls that > > > point to implementation packages called paimon-core-java and > > > paimon-core-rust. This should be as simple as "import paimon-core-rust" > > in > > > Python, provided we build classes that expose the same API. > > > > > > On Wed, Aug 7, 2024, at 21:35, Jingsong Li wrote: > > > > Thanks Zelin for driving the paimon-python! > > > > > > > > +1 to it. > > > > > > > > In my opinion, a Python API definition is important, and we need to > > > > provide a clear API definition. > > > > > > > > About implementation: > > > > > > > > - I think the first version can be implemented in py4j, wrapping Java > > > > implementation and transmitting data through Arrow IPC. > > > > - At the same time, we have a further optimized version, paimon-rust > > > > version, which can gradually improve some features. > > > > > > > > What do you think? Zelin and Xuanwo > > > > > > > > Best, > > > > Jingsong > > > > > > > > On Wed, Aug 7, 2024 at 7:44 PM Xuanwo <xua...@apache.org> wrote: > > > >> > > > >> Hi, yu zelin > > > >> > > > >> Thank you for initiating this discussion. > > > >> > > > >> I'm also working on this. My current plan is to build paimon-rust, > > > followed by paimon-python via pyo3 by exposing the paimon-rust API. > > > >> > > > >> PyO3 can build a native Python package without additional > > dependencies. > > > This way, users can install paimon-python simply by running pip install > > > paimon, without needing any extra setup for Java, Paimon, Flink or > other > > > components. > > > >> > > > >> Are you interested in this direction? > > > >> > > > >> Some context: the iceberg community is also working use iceberg-rust > > in > > > pyicberg directly: https://github.com/apache/iceberg-rust/pull/518 > > > >> > > > >> On Wed, Aug 7, 2024, at 19:24, yu zelin wrote: > > > >> > Hi devs, > > > >> > > > > >> > I'd like to introduce a python SDK for paimon (paimon-python). > > Python > > > users > > > >> > can use it to access paimon data more easily. > > > >> > > > > >> > In the first version, I would leverage py4j to wrap Java SDK with > > > python > > > >> > codes. Briefly speaking, py4j can start a JVM and > > > >> > load Java classes, so we can use it to access Paimon table Java > API > > > and get > > > >> > results in Python code. An example is flink-python: > > > >> > https://github.com/apache/flink/tree/master/flink-python > > > >> > > > > >> > I'd like to give an paimon example: > > > >> > ``` > > > >> > > > > >> > class FileStoreTable(object): > > > >> > > > > >> > > > > >> > @classmethod > > > >> > > > > >> > def create(cls, context: CatalogContext) -> 'FileStoreTable': > > > >> > > > > >> > *# gateway is built via py4j to access JVM* > > > >> > > > > >> > gateway = get_gateway() > > > >> > > > > >> > *# use gateway.jvm to access java classes* > > > >> > > > > >> > j_table = > > > >> > > > > > gateway.jvm.FileStoreTableFactory.create(context.to_j_catalog_context()) > > > >> > > > > >> > return FileStoreTable(j_table) > > > >> > > > > >> > > > > >> > def __init__(self, j_table): > > > >> > > > > >> > self.__j_table = j_table > > > >> > > > > >> > > > > >> > # wrap Java method > > > >> > > > > >> > def primary_keys(self) -> List[str]: > > > >> > > > > >> > return self.__j_table.primaryKeys() > > > >> > ``` > > > >> > Then we can wrap scan, read interface to read table and write, > > commit > > > >> > interface to write table via Python. > > > >> > > > > >> > Looking forward to your suggestions. > > > >> > > > > >> > Best Regards, > > > >> > Zelin Yu > > > >> > > > >> -- > > > >> Xuanwo > > > >> > > > >> https://xuanwo.io/ > > > > > > -- > > > Xuanwo > > > > > > https://xuanwo.io/ > > > > > >