Re: [DISCUSS] Incubating Proposal for Paimon

Yu Li Sun, 05 Mar 2023 23:48:07 -0800

Thanks all for the comments and positive feedback. Let me start a vote.

Best Regards,
Yu



On Thu, 2 Mar 2023 at 20:42, Robert Metzger <rmetz...@apache.org> wrote:

> Thanks for the proposal. I'm happy to act as a mentor for the project.
> I respect the desire to go through the regular incubation process, and
> maybe it is a good thing for the Paimon community to revisit some of the
> processes and customs and develop their own style, independent of Flink as
> part of the incubation process.
>
> I have no doubt regarding the technical or "community building" ability of
> the initial team.
>
>
> On Mon, Feb 27, 2023 at 2:49 PM Becket Qin <becket....@gmail.com> wrote:
>
> > I am really excited to see Paimon become an independent ASF incubation
> > project, and I am happy to be a mentor of the project.
> >
> > Re Dave,
> >
> > The plan is to let Paimon eventually graduate as a TLP by itself. The
> > project bootstrapped as a subproject of Flink because 1) it was designed
> to
> > provide a stream and batch unified storage which matches the vision of
> > Flink as a stream and batch unified engine and 2) the project was
> developed
> > by the same team who is working on Flink.
> >
> > Now since there have been a few releases, we see strong and reasonable
> use
> > cases from the users letting Paimon (flink-table-store) work with engines
> > other than Flink, such as Spark / Trino. Continuing to keep Paimon as a
> > subject of Flink might unnecessarily limit the development of the project
> > and is somewhat misleading to the users. Given its scope, we believe it
> > makes a lot of sense for Paimon to get incubated on its own independent
> of
> > Flink. There has been a thorough discussion[1] and vote[2] about this
> among
> > the Flink PMC.
> >
> > Cheers,
> >
> > Jiangjie (Becket) Qin
> >
> > [1] https://lists.apache.org/thread/2ybxfg3zrzn4l3tnq3w2w3xvkhk0f9jk
> > [2] https://lists.apache.org/thread/95wyc51rfmsqc9osc86q7zx3491m7bvt
> >
> > On Fri, Feb 24, 2023 at 12:10 PM Dave Fisher <wave4d...@comcast.net>
> > wrote:
> >
> >> An interesting proposal. Since Paimon is already part of Apache Flink
> >> does the podling intend to graduate as it’s own Top Level Project? Or,
> is
> >> the plan currently to become a subproject of Flink? I’m just curious.
> Were
> >> there any discussions within the Flink community about incubating
> Paimon?
> >>
> >> Best Regards,
> >> Dave
> >>
> >> Sent from my iPhone
> >>
> >> > On Feb 23, 2023, at 7:58 PM, Yu Li <car...@gmail.com> wrote:
> >> >
> >> > Revision: the hyperlink of the first reference is incorrect and
> please
> >> use
> >> > the website address directly instead of clicking it (sorry for my
> >> mistake).
> >> >
> >> > For easier reference: https://github.com/apache/flink-table-store
> >> >
> >> > Best Regards,
> >> > Yu
> >> >
> >> >
> >> >> On Fri, 24 Feb 2023 at 11:48, Yu Li <car...@gmail.com> wrote:
> >> >>
> >> >> Hi All,
> >> >>
> >> >>
> >> >> I would like to propose Paimon [1] as a new apache incubator project,
> >> and
> >> >> you can find the proposal [2] of Paimon for more details.
> >> >>
> >> >>
> >> >> Paimon is a unified lake storage to build dynamic tables for both
> >> stream
> >> >> and batch processing with big data compute engines (Apache Flink,
> >> Apache
> >> >> Spark, Apache
> >> >> Hive, Trino, etc.), supporting high-speed data ingestion and
> real-time
> >> data query.
> >> >> With the adoption of stream processing in production, there is an
> >> increasing demand for storage to simultaneously support updates, deletes
> >> and streaming reads,
> >> >> which cannot be fully satisfied by existing lake storages. To tackle
> >> these
> >> >> new challenges, Paimon
> >> >> natively adopts LSM (Log-Structured Merge-tree) as its underlying
> data
> >> structure, and provides enhanced performance for data with primary keys
> >> >> (besides
> >> >> the common lake storage capabilities). What's more, Paimon supports
> >> both batch and stream operations (reads and writes), facilitating
> >> applications pursuing batch-stream-unified semantics. Specifically:
> >> >>
> >> >>
> >> >> 1. Paimon provides excellent performance on the intensive update
> >> >> / delete workload, leveraging the append-write feature of the LSM
> data
> >> >> structure.
> >> >>
> >> >> 2. Paimon utilizes the ordered feature of LSM to support effective
> >> filter
> >> >> pushdown, and could reduce
> >> >> the latency of queries with primary key filtering to milliseconds.
> >> >>
> >> >> 3.
> >> >> Paimon supports various (row-based or row-columnar) file formats
> >> including Apache Avro, Apache ORC and Apache Parquet (rows will be
> sorted
> >> by the primary key before writing out).
> >> >>
> >> >> 4.
> >> >> Tables provided by Paimon can be queried by various engines,
> including
> >> Apache Flink, Apache Spark, Apache Hive, Trino, etc.
> >> >>
> >> >> 5.
> >> >> Paimon's metadata is self-managed, stored on the distributed file
> >> system and can be synchronized to Hive metastore (HMS).
> >> >>
> >> >> 6.
> >> >> Besides the common batch read and write support, Paimon also supports
> >> streaming read and change data feed.
> >> >>
> >> >>
> >> >>
> >> >> Paimon has been used by various users and companies, including
> >> Alibaba, Bilibili, ByteDance and so on. Paimon is also integrated into
> >> Alibaba Cloud's E-MapReduce and Realtime Compute products to provide
> cloud
> >> services.
> >> >>
> >> >>
> >> >> Paimon was founded in the Flink community in 2022 with the name of
> >> "Flink Table Store”.
> >> >> It has been developed for more than one year and produced 4 formal
> >> >> releases. As its adoption expands to more computing engines, some of
> >> the ecology users express their concerns about the neutrality of the
> >> project. This makes us rethink the positioning of Flink Table Store,
> which
> >> can be an independent lake storage.
> >> >>
> >> >>
> >> >> With adequate discussions, we have got the support from the Flink
> >> community to enter Apache incubation
> >> >> [3] [4], with the below expectations:
> >> >>
> >> >> 1.
> >> >> Expand Paimon's ecosystem, providing independent Java APIs to support
> >> reading and writing from more big data engines such as Apache
> >> >> Doris, Apache Hive, Apache Presto, Apache Spark, Trino, etc.
> >> >>
> >> >> 2.
> >> >> Supplement key capabilities, especially streaming reads and intensive
> >> updates/deletes,  for creating a unified and easy-to-use streaming data
> >> warehouse (lakehouse).
> >> >>
> >> >> 3. Grow into a more vibrant and neutral open source community.
> >> >>
> >> >>
> >> >> And we believe the Paimon project will provide tremendous value for
> the
> >> >> community if it is introduced into the Apache incubator.
> >> >>
> >> >>
> >> >> I will help this project as the champion and mentor the project
> >> together
> >> >> with three other mentors (many thanks):
> >> >>
> >> >>
> >> >> * Becket Qin (j...@apache.org)
> >> >>
> >> >> * Robert Metzger (rmetz...@apache.org)
> >> >>
> >> >> * Stephan Ewen (se...@apache.org)
> >> >>
> >> >>
> >> >> Look forward to your feedback. Thanks.
> >> >>
> >> >>
> >> >> Best Regards,
> >> >> Yu
> >> >>
> >> >> [1] https://github.com/apache/flink-table-store
> >> >> <https://github.com/alibaba/RemoteShuffleService>
> >> >>
> >> >> [2]
> >> https://cwiki.apache.org/confluence/display/INCUBATOR/PaimonProposal
> >> >>
> >> >> [3] https://lists.apache.org/thread/2ybxfg3zrzn4l3tnq3w2w3xvkhk0f9jk
> >> >>
> >> >> [4] https://lists.apache.org/thread/kn7c08cr4l0ynt551yfjqvzh5ns226r6
> >> >>
> >> >>
> >> >>
> >>
> >>
>

Re: [DISCUSS] Incubating Proposal for Paimon

Reply via email to