Obviously, I prefer to have a Rust-powered Paimon-Python implementation. I 
truly believe it's the future our community should focus on and pursue.

However, I still think these two projects can collaborate by sharing the same 
Python API definition but with different implementations.

For example, paimon-python can define APIs and accept paimon-impls that point 
to implementation packages called paimon-core-java and paimon-core-rust. This 
should be as simple as "import paimon-core-rust" in Python, provided we build 
classes that expose the same API.

On Wed, Aug 7, 2024, at 21:35, Jingsong Li wrote:
> Thanks Zelin for driving the paimon-python!
>
> +1 to it.
>
> In my opinion, a Python API definition is important, and we need to
> provide a clear API definition.
>
> About implementation:
>
> - I think the first version can be implemented in py4j, wrapping Java
> implementation and transmitting data through Arrow IPC.
> - At the same time, we have a further optimized version, paimon-rust
> version, which can gradually improve some features.
>
> What do you think? Zelin and Xuanwo
>
> Best,
> Jingsong
>
> On Wed, Aug 7, 2024 at 7:44 PM Xuanwo <xua...@apache.org> wrote:
>>
>> Hi, yu zelin
>>
>> Thank you for initiating this discussion.
>>
>> I'm also working on this. My current plan is to build paimon-rust, followed 
>> by paimon-python via pyo3 by exposing the paimon-rust API.
>>
>> PyO3 can build a native Python package without additional dependencies. This 
>> way, users can install paimon-python simply by running pip install paimon, 
>> without needing any extra setup for Java, Paimon, Flink or other components.
>>
>> Are you interested in this direction?
>>
>> Some context: the iceberg community is also working use iceberg-rust in 
>> pyicberg directly: https://github.com/apache/iceberg-rust/pull/518
>>
>> On Wed, Aug 7, 2024, at 19:24, yu zelin wrote:
>> > Hi devs,
>> >
>> > I'd like to introduce a python SDK for paimon (paimon-python). Python users
>> > can use it to access paimon data more easily.
>> >
>> > In the first version, I would leverage py4j to wrap Java SDK with python
>> > codes. Briefly speaking, py4j can start a JVM and
>> > load Java classes, so we can use it to access Paimon table Java API and get
>> > results in Python code. An example is flink-python:
>> > https://github.com/apache/flink/tree/master/flink-python
>> >
>> > I'd like to give an paimon example:
>> > ```
>> >
>> > class FileStoreTable(object):
>> >
>> >
>> >     @classmethod
>> >
>> >     def create(cls, context: CatalogContext) -> 'FileStoreTable':
>> >
>> >         *# gateway is built via py4j to access JVM*
>> >
>> >         gateway = get_gateway()
>> >
>> >         *# use gateway.jvm to access java classes*
>> >
>> >         j_table =
>> > gateway.jvm.FileStoreTableFactory.create(context.to_j_catalog_context())
>> >
>> >         return FileStoreTable(j_table)
>> >
>> >
>> >     def __init__(self, j_table):
>> >
>> >         self.__j_table = j_table
>> >
>> >
>> >     # wrap Java method
>> >
>> >     def primary_keys(self) -> List[str]:
>> >
>> >         return self.__j_table.primaryKeys()
>> > ```
>> > Then we can wrap scan, read interface to read table and write, commit
>> > interface to write table via Python.
>> >
>> > Looking forward to your suggestions.
>> >
>> > Best Regards,
>> > Zelin Yu
>>
>> --
>> Xuanwo
>>
>> https://xuanwo.io/

-- 
Xuanwo

https://xuanwo.io/

Reply via email to