Thanks Zelin for driving the paimon-python!

+1 to it.

In my opinion, a Python API definition is important, and we need to
provide a clear API definition.

About implementation:

- I think the first version can be implemented in py4j, wrapping Java
implementation and transmitting data through Arrow IPC.
- At the same time, we have a further optimized version, paimon-rust
version, which can gradually improve some features.

What do you think? Zelin and Xuanwo

Best,
Jingsong

On Wed, Aug 7, 2024 at 7:44 PM Xuanwo <xua...@apache.org> wrote:
>
> Hi, yu zelin
>
> Thank you for initiating this discussion.
>
> I'm also working on this. My current plan is to build paimon-rust, followed 
> by paimon-python via pyo3 by exposing the paimon-rust API.
>
> PyO3 can build a native Python package without additional dependencies. This 
> way, users can install paimon-python simply by running pip install paimon, 
> without needing any extra setup for Java, Paimon, Flink or other components.
>
> Are you interested in this direction?
>
> Some context: the iceberg community is also working use iceberg-rust in 
> pyicberg directly: https://github.com/apache/iceberg-rust/pull/518
>
> On Wed, Aug 7, 2024, at 19:24, yu zelin wrote:
> > Hi devs,
> >
> > I'd like to introduce a python SDK for paimon (paimon-python). Python users
> > can use it to access paimon data more easily.
> >
> > In the first version, I would leverage py4j to wrap Java SDK with python
> > codes. Briefly speaking, py4j can start a JVM and
> > load Java classes, so we can use it to access Paimon table Java API and get
> > results in Python code. An example is flink-python:
> > https://github.com/apache/flink/tree/master/flink-python
> >
> > I'd like to give an paimon example:
> > ```
> >
> > class FileStoreTable(object):
> >
> >
> >     @classmethod
> >
> >     def create(cls, context: CatalogContext) -> 'FileStoreTable':
> >
> >         *# gateway is built via py4j to access JVM*
> >
> >         gateway = get_gateway()
> >
> >         *# use gateway.jvm to access java classes*
> >
> >         j_table =
> > gateway.jvm.FileStoreTableFactory.create(context.to_j_catalog_context())
> >
> >         return FileStoreTable(j_table)
> >
> >
> >     def __init__(self, j_table):
> >
> >         self.__j_table = j_table
> >
> >
> >     # wrap Java method
> >
> >     def primary_keys(self) -> List[str]:
> >
> >         return self.__j_table.primaryKeys()
> > ```
> > Then we can wrap scan, read interface to read table and write, commit
> > interface to write table via Python.
> >
> > Looking forward to your suggestions.
> >
> > Best Regards,
> > Zelin Yu
>
> --
> Xuanwo
>
> https://xuanwo.io/

Reply via email to