Thanks everyone for the comments and feedback.

It seems to me that the main question here is about - "how can the Flink
community maintain the connector?".

Here are two thoughts from myself.

1) I think how and where to host this integration is kind of less important
here. I believe there can be many ways to achieve it.
As part of the contribution, what we are looking for here is how these two
communities can build the collaboration relationship on developing
the integration between Pulsar and Flink. Even we can try our best to catch
up all the updates in Flink community. We are still
facing the fact that we have less experiences in Flink than folks in Flink
community. In order to make sure we maintain and deliver
a high-quality pulsar-flink integration to the users who use both
technologies, we need some help from the experts from Flink community.

2) We have been following FLIP-27 for a while. Originally we were thinking
of contributing the connectors back after integrating with the
new API introduced in FLIP-27. But we decided to initiate the conversation
as early as possible. Because we believe there are more benefits doing
it now rather than later. As part of contribution, it can help Flink
community understand more about Pulsar and the potential integration points.
Also we can also help Flink community verify the new connector API as well
as other new API (e.g. catalog API).

Thanks,
Sijie

On Wed, Sep 4, 2019 at 5:24 AM Becket Qin <becket....@gmail.com> wrote:

> Hi Yijie,
>
> Thanks for the interest in contributing the Pulsar connector.
>
> In general, I think having Pulsar connector with strong support is a
> valuable addition to Flink. So I am happy the shepherd this effort.
> Meanwhile, I would also like to provide some context and recent efforts on
> the Flink connectors ecosystem.
>
> The current way Flink maintains its connector has hit the scalability bar.
> With more and more connectors coming into Flink repo, we are facing a few
> problems such as long build and testing time. To address this problem, we
> have attempted to do the following:
> 1. Split out the connectors into a separate repository. This is temporarily
> on hold due to potential solution to shorten the build time.
> 2. Encourage the connectors to stay as ecosystem project while Flink tries
> to provide good support for functionality and compatibility tests. Robert
> has driven to create a Flink Ecosystem project website and it is going
> through some final approval process.
>
> Given the above efforts, it would be great to first see if we can have
> Pulsar connector as an ecosystem project with great support. It would be
> good to hear how the Flink Pulsar connector is tested currently to see if
> we can learn something to maintain it as an ecosystem project with good
> quality and test coverage. If the quality as an ecosystem project is hard
> to guarantee, we may as well adopt it into the main repo.
>
> BTW, another ongoing effort is FLIP-27 where we are making changes to the
> Flink source connector architecture and interface. This change will likely
> land in 1.10. Therefore timing wise, if we are going to have the Pulsar
> connector in main repo, I am wondering if we should hold a little bit and
> let the Pulsar connector adapt to the new interface to avoid shortly
> deprecated work?
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Wed, Sep 4, 2019 at 4:32 PM Chesnay Schepler <ches...@apache.org>
> wrote:
>
> > I'm quite worried that we may end up repeating history.
> >
> > There were already 2 attempts at contributing a pulsar connector, both
> > of which failed because no committer was getting involved, despite the
> > contributor opening a dedicated discussion thread about the contribution
> > beforehand and getting several +1's from committers.
> >
> > We should really make sure that if we welcome/approve such a
> > contribution it will actually get the attention it deserves.
> >
> > As such, I'm inclined to recommend maintaining the connector outside of
> > Flink. We could link to it from the documentation to give it more
> exposure.
> > With the upcoming page for sharing artifacts among the community (what's
> > the state of that anyway?), this may be a better option.
> >
> > On 04/09/2019 10:16, Till Rohrmann wrote:
> > > Hi everyone,
> > >
> > > thanks a lot for starting this discussion Yijie. I think the Pulsar
> > > connector would be a very valuable addition since Pulsar becomes more
> and
> > > more popular and it would further expand Flink's interoperability. Also
> > > from a project perspective it makes sense for me to place the connector
> > in
> > > the downstream project.
> > >
> > > My main concern/question is how can the Flink community maintain the
> > > connector? We have seen in the past that connectors are some of the
> most
> > > actively developed components because they need to be kept in sync with
> > the
> > > external system and with Flink. Given that the Pulsar community is
> > willing
> > > to help with maintaining, improving and evolving the connector, I'm
> > > optimistic that we can achieve this. Hence, +1 for contributing it back
> > to
> > > Flink.
> > >
> > > Cheers,
> > > Till
> > >
> > >
> > >
> > > On Wed, Sep 4, 2019 at 2:03 AM Sijie Guo <guosi...@gmail.com> wrote:
> > >
> > >> Hi Yun,
> > >>
> > >> Since I was the main driver behind FLINK-9641 and FLINK-9168, let me
> > try to
> > >> add more context on this.
> > >>
> > >> FLINK-9641 and FLINK-9168 was created for bringing Pulsar as source
> and
> > >> sink for Flink. The integration was done with Flink 1.6.0. We sent out
> > pull
> > >> requests about a year ago and we ended up maintaining those connectors
> > in
> > >> Pulsar for Pulsar users to use Flink to process event streams in
> Pulsar.
> > >> (See https://github.com/apache/pulsar/tree/master/pulsar-flink). The
> > Flink
> > >> 1.6 integration is pretty simple and there is no schema
> considerations.
> > >>
> > >> In the past year, we have made a lot of changes in Pulsar and brought
> > >> Pulsar schema as the first-class citizen in Pulsar. We also integrated
> > with
> > >> other computing engines for processing Pulsar event streams with
> Pulsar
> > >> schema.
> > >>
> > >> It led us to rethink how to integrate with Flink in the best way. Then
> > we
> > >> reimplement the pulsar-flink connectors from the ground up with schema
> > and
> > >> bring table API and catalog API as the first-class citizen in the
> > >> integration. With that being said, in the new pulsar-flink
> > implementation,
> > >> you can register pulsar as a flink catalog and query / process the
> event
> > >> streams using Flink SQL.
> > >>
> > >> This is an example about how to use Pulsar as a Flink catalog:
> > >>
> > >>
> >
> https://github.com/streamnative/pulsar-flink/blob/3eeddec5625fc7dddc3f8a3ec69f72e1614ca9c9/README.md#use-pulsar-catalog
> > >>
> > >> Yijie has also written a blog post explaining why we re-implement the
> > flink
> > >> connector with Flink 1.9 and what are the changes we made in the new
> > >> connector:
> > >>
> > >>
> >
> https://medium.com/streamnative/use-apache-pulsar-as-streaming-table-with-8-lines-of-code-39033a93947f
> > >>
> > >> We believe Pulsar is not just a simple data sink or source for Flink.
> It
> > >> actually can be a fully integrated streaming data storage for Flink in
> > many
> > >> areas (sink, source, schema/catalog and state). The combination of
> Flink
> > >> and Pulsar can create a great streaming warehouse architecture for
> > >> streaming-first, unified data processing. Since we are talking to
> > >> contribute Pulsar integration to Flink here, we are also dedicated to
> > >> maintain, improve and evolve the integration with Flink to help the
> > users
> > >> who use both Flink and Pulsar.
> > >>
> > >> Hope this give you a bit more background about the pulsar flink
> > >> integration. Let me know what are your thoughts.
> > >>
> > >> Thanks,
> > >> Sijie
> > >>
> > >>
> > >> On Tue, Sep 3, 2019 at 11:54 AM Yun Tang <myas...@live.com> wrote:
> > >>
> > >>> Hi Yijie
> > >>>
> > >>> I can see that Pulsar becomes more and more popular recently and very
> > >> glad
> > >>> to see more people willing to contribute to Flink ecosystem.
> > >>>
> > >>> Before any further discussion, would you please give some explanation
> > of
> > >>> the relationship between this thread to current existing JIRAs of
> > pulsar
> > >>> source [1] and sink [2] connector? Will the contribution contains
> part
> > of
> > >>> those PRs or totally different implementation?
> > >>>
> > >>> [1] https://issues.apache.org/jira/browse/FLINK-9641
> > >>> [2] https://issues.apache.org/jira/browse/FLINK-9168
> > >>>
> > >>> Best
> > >>> Yun Tang
> > >>> ________________________________
> > >>> From: Yijie Shen <henry.yijies...@gmail.com>
> > >>> Sent: Tuesday, September 3, 2019 13:57
> > >>> To: dev@flink.apache.org <dev@flink.apache.org>
> > >>> Subject: [DISCUSS] Contribute Pulsar Flink connector back to Flink
> > >>>
> > >>> Dear Flink Community!
> > >>>
> > >>> I would like to open the discussion of contributing Pulsar Flink
> > >>> connector [0] back to Flink.
> > >>>
> > >>> ## A brief introduction to Apache Pulsar
> > >>>
> > >>> Apache Pulsar[1] is a multi-tenant, high-performance distributed
> > >>> pub-sub messaging system. Pulsar includes multiple features such as
> > >>> native support for multiple clusters in a Pulsar instance, with
> > >>> seamless geo-replication of messages across clusters, very low
> publish
> > >>> and end-to-end latency, seamless scalability to over a million
> topics,
> > >>> and guaranteed message delivery with persistent message storage
> > >>> provided by Apache BookKeeper. Nowadays, Pulsar has been adopted by
> > >>> more and more companies[2].
> > >>>
> > >>> ## The status of Pulsar Flink connector
> > >>>
> > >>> The Pulsar Flink connector we are planning to contribute is built
> upon
> > >>> Flink 1.9.0 and Pulsar 2.4.0. The main features are:
> > >>> - Pulsar as a streaming source with exactly-once guarantee.
> > >>> - Sink streaming results to Pulsar with at-least-once semantics. (We
> > >>> would update this to exactly-once as well when Pulsar gets all
> > >>> transaction features ready in its 2.5.0 version)
> > >>> - Build upon Flink new Table API Type system (FLIP-37[3]), and can
> > >>> automatically (de)serialize messages with the help of Pulsar schema.
> > >>> - Integrate with Flink new Catalog API (FLIP-30[4]), which enables
> the
> > >>> use of Pulsar topics as tables in Table API as well as SQL client.
> > >>>
> > >>> ## Reference
> > >>> [0] https://github.com/streamnative/pulsar-flink
> > >>> [1] https://pulsar.apache.org/
> > >>> [2] https://pulsar.apache.org/en/powered-by/
> > >>> [3]
> > >>>
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-37%3A+Rework+of+the+Table+API+Type+System
> > >>> [4]
> > >>>
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-30%3A+Unified+Catalog+APIs
> > >>>
> > >>> Best,
> > >>> Yijie Shen
> > >>>
> >
> >
>

Reply via email to