Hi Jingsong and Xuanwo,

Thanks for your suggestion! I agree that rust-based paimon-python is
better, but in order to launch the python SDK sooner,
I think it's necessary to introduce 'paimon-core-java' implementation. So I
will draft an API definition and we can have further
discussions on it.


On Thu, Aug 8, 2024 at 9:25 AM Kerwin <zhuang.ker...@gmail.com> wrote:

> +1
>
> Looking forward to the implementation of python.
>
> --
> Best Wish
> — Kerwin (zhuangchong)
>
>
>
> Jingsong Li <jingsongl...@gmail.com> 于2024年8月8日周四 09:12写道:
>
> > Yes, we need a plugin mechanism.
> >
> > Something like Java SPI is good, implementation based on dependencies or
> > import.
> >
> >
> > Xuanwo <xua...@apache.org>于2024年8月7日 周三22:34写道:
> >
> > > Obviously, I prefer to have a Rust-powered Paimon-Python
> implementation.
> > I
> > > truly believe it's the future our community should focus on and pursue.
> > >
> > > However, I still think these two projects can collaborate by sharing
> the
> > > same Python API definition but with different implementations.
> > >
> > > For example, paimon-python can define APIs and accept paimon-impls that
> > > point to implementation packages called paimon-core-java and
> > > paimon-core-rust. This should be as simple as "import paimon-core-rust"
> > in
> > > Python, provided we build classes that expose the same API.
> > >
> > > On Wed, Aug 7, 2024, at 21:35, Jingsong Li wrote:
> > > > Thanks Zelin for driving the paimon-python!
> > > >
> > > > +1 to it.
> > > >
> > > > In my opinion, a Python API definition is important, and we need to
> > > > provide a clear API definition.
> > > >
> > > > About implementation:
> > > >
> > > > - I think the first version can be implemented in py4j, wrapping Java
> > > > implementation and transmitting data through Arrow IPC.
> > > > - At the same time, we have a further optimized version, paimon-rust
> > > > version, which can gradually improve some features.
> > > >
> > > > What do you think? Zelin and Xuanwo
> > > >
> > > > Best,
> > > > Jingsong
> > > >
> > > > On Wed, Aug 7, 2024 at 7:44 PM Xuanwo <xua...@apache.org> wrote:
> > > >>
> > > >> Hi, yu zelin
> > > >>
> > > >> Thank you for initiating this discussion.
> > > >>
> > > >> I'm also working on this. My current plan is to build paimon-rust,
> > > followed by paimon-python via pyo3 by exposing the paimon-rust API.
> > > >>
> > > >> PyO3 can build a native Python package without additional
> > dependencies.
> > > This way, users can install paimon-python simply by running pip install
> > > paimon, without needing any extra setup for Java, Paimon, Flink or
> other
> > > components.
> > > >>
> > > >> Are you interested in this direction?
> > > >>
> > > >> Some context: the iceberg community is also working use iceberg-rust
> > in
> > > pyicberg directly: https://github.com/apache/iceberg-rust/pull/518
> > > >>
> > > >> On Wed, Aug 7, 2024, at 19:24, yu zelin wrote:
> > > >> > Hi devs,
> > > >> >
> > > >> > I'd like to introduce a python SDK for paimon (paimon-python).
> > Python
> > > users
> > > >> > can use it to access paimon data more easily.
> > > >> >
> > > >> > In the first version, I would leverage py4j to wrap Java SDK with
> > > python
> > > >> > codes. Briefly speaking, py4j can start a JVM and
> > > >> > load Java classes, so we can use it to access Paimon table Java
> API
> > > and get
> > > >> > results in Python code. An example is flink-python:
> > > >> > https://github.com/apache/flink/tree/master/flink-python
> > > >> >
> > > >> > I'd like to give an paimon example:
> > > >> > ```
> > > >> >
> > > >> > class FileStoreTable(object):
> > > >> >
> > > >> >
> > > >> >     @classmethod
> > > >> >
> > > >> >     def create(cls, context: CatalogContext) -> 'FileStoreTable':
> > > >> >
> > > >> >         *# gateway is built via py4j to access JVM*
> > > >> >
> > > >> >         gateway = get_gateway()
> > > >> >
> > > >> >         *# use gateway.jvm to access java classes*
> > > >> >
> > > >> >         j_table =
> > > >> >
> > >
> gateway.jvm.FileStoreTableFactory.create(context.to_j_catalog_context())
> > > >> >
> > > >> >         return FileStoreTable(j_table)
> > > >> >
> > > >> >
> > > >> >     def __init__(self, j_table):
> > > >> >
> > > >> >         self.__j_table = j_table
> > > >> >
> > > >> >
> > > >> >     # wrap Java method
> > > >> >
> > > >> >     def primary_keys(self) -> List[str]:
> > > >> >
> > > >> >         return self.__j_table.primaryKeys()
> > > >> > ```
> > > >> > Then we can wrap scan, read interface to read table and write,
> > commit
> > > >> > interface to write table via Python.
> > > >> >
> > > >> > Looking forward to your suggestions.
> > > >> >
> > > >> > Best Regards,
> > > >> > Zelin Yu
> > > >>
> > > >> --
> > > >> Xuanwo
> > > >>
> > > >> https://xuanwo.io/
> > >
> > > --
> > > Xuanwo
> > >
> > > https://xuanwo.io/
> > >
> >
>

Reply via email to