Hi everyone,

I'm wondering what the problem would be if we committed the Pulsar
connector before the new source interface is ready. If I understood it
correctly, then we need to support the old source interface anyway for the
existing connectors. By checking it in early I could see the benefit that
our users could start using the connector earlier. Moreover, it would
prevent that the Pulsar integration is being delayed in case that the
source interface should be delayed. The only downside I see is the extra
review effort and potential fixes which might be irrelevant for the new
source interface implementation. I guess it mainly depends on how certain
we are when the new source interface will be ready.

Cheers,
Till

On Thu, Sep 5, 2019 at 8:56 AM Becket Qin <becket....@gmail.com> wrote:

> Hi Sijie and Yijie,
>
> Thanks for sharing your thoughts.
>
> Just want to have some update on FLIP-27. Although the FLIP wiki and
> discussion thread has been quiet for some time, a few committer /
> contributors in Flink community were actually prototyping the entire thing.
> We have made some good progress there but want to update the FLIP wiki
> after the entire thing is verified to work in case there are some last
> minute surprise in the implementation. I don't have an exact ETA yet, but I
> guess it is going to be within a month or so.
>
> I am happy to review the current Flink Pulsar connector and see if it would
> fit in FLIP-27. It would be good to avoid the case that we checked in the
> Pulsar connector with some review efforts and shortly after that the new
> Source interface is ready.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Thu, Sep 5, 2019 at 8:39 AM Yijie Shen <henry.yijies...@gmail.com>
> wrote:
>
> > Thanks for all the feedback and suggestions!
> >
> > As Sijie said, the goal of the connector has always been to provide
> > users with the latest features of both systems as soon as possible. We
> > propose to contribute the connector to Flink and hope to get more
> > suggestions and feedback from Flink experts to ensure the high quality
> > of the connector.
> >
> > For FLIP-27, we noticed its existence at the beginning of reworking
> > the connector implementation based on Flink 1.9; we also wanted to
> > build a connector that supports both batch and stream computing based
> > on it.
> > However, it has been inactive for some time, so we decided to provide
> > a connector with most of the new features, such as the new type system
> > and the new catalog API first. We will pay attention to the progress
> > of FLIP-27 continually and incorporate it with the connector as soon
> > as possible.
> >
> > Regarding the test status of the connector, we are following the other
> > connectors' test in Flink repository and aimed to provide throughout
> > tests as we could. We are also happy to hear suggestions and
> > supervision from the Flink community to improve the stability and
> > performance of the connector continuously.
> >
> > Best,
> > Yijie
> >
> > On Thu, Sep 5, 2019 at 5:59 AM Sijie Guo <guosi...@gmail.com> wrote:
> > >
> > > Thanks everyone for the comments and feedback.
> > >
> > > It seems to me that the main question here is about - "how can the
> Flink
> > > community maintain the connector?".
> > >
> > > Here are two thoughts from myself.
> > >
> > > 1) I think how and where to host this integration is kind of less
> > important
> > > here. I believe there can be many ways to achieve it.
> > > As part of the contribution, what we are looking for here is how these
> > two
> > > communities can build the collaboration relationship on developing
> > > the integration between Pulsar and Flink. Even we can try our best to
> > catch
> > > up all the updates in Flink community. We are still
> > > facing the fact that we have less experiences in Flink than folks in
> > Flink
> > > community. In order to make sure we maintain and deliver
> > > a high-quality pulsar-flink integration to the users who use both
> > > technologies, we need some help from the experts from Flink community.
> > >
> > > 2) We have been following FLIP-27 for a while. Originally we were
> > thinking
> > > of contributing the connectors back after integrating with the
> > > new API introduced in FLIP-27. But we decided to initiate the
> > conversation
> > > as early as possible. Because we believe there are more benefits doing
> > > it now rather than later. As part of contribution, it can help Flink
> > > community understand more about Pulsar and the potential integration
> > points.
> > > Also we can also help Flink community verify the new connector API as
> > well
> > > as other new API (e.g. catalog API).
> > >
> > > Thanks,
> > > Sijie
> > >
> > > On Wed, Sep 4, 2019 at 5:24 AM Becket Qin <becket....@gmail.com>
> wrote:
> > >
> > > > Hi Yijie,
> > > >
> > > > Thanks for the interest in contributing the Pulsar connector.
> > > >
> > > > In general, I think having Pulsar connector with strong support is a
> > > > valuable addition to Flink. So I am happy the shepherd this effort.
> > > > Meanwhile, I would also like to provide some context and recent
> > efforts on
> > > > the Flink connectors ecosystem.
> > > >
> > > > The current way Flink maintains its connector has hit the scalability
> > bar.
> > > > With more and more connectors coming into Flink repo, we are facing a
> > few
> > > > problems such as long build and testing time. To address this
> problem,
> > we
> > > > have attempted to do the following:
> > > > 1. Split out the connectors into a separate repository. This is
> > temporarily
> > > > on hold due to potential solution to shorten the build time.
> > > > 2. Encourage the connectors to stay as ecosystem project while Flink
> > tries
> > > > to provide good support for functionality and compatibility tests.
> > Robert
> > > > has driven to create a Flink Ecosystem project website and it is
> going
> > > > through some final approval process.
> > > >
> > > > Given the above efforts, it would be great to first see if we can
> have
> > > > Pulsar connector as an ecosystem project with great support. It would
> > be
> > > > good to hear how the Flink Pulsar connector is tested currently to
> see
> > if
> > > > we can learn something to maintain it as an ecosystem project with
> good
> > > > quality and test coverage. If the quality as an ecosystem project is
> > hard
> > > > to guarantee, we may as well adopt it into the main repo.
> > > >
> > > > BTW, another ongoing effort is FLIP-27 where we are making changes to
> > the
> > > > Flink source connector architecture and interface. This change will
> > likely
> > > > land in 1.10. Therefore timing wise, if we are going to have the
> Pulsar
> > > > connector in main repo, I am wondering if we should hold a little bit
> > and
> > > > let the Pulsar connector adapt to the new interface to avoid shortly
> > > > deprecated work?
> > > >
> > > > Thanks,
> > > >
> > > > Jiangjie (Becket) Qin
> > > >
> > > > On Wed, Sep 4, 2019 at 4:32 PM Chesnay Schepler <ches...@apache.org>
> > > > wrote:
> > > >
> > > > > I'm quite worried that we may end up repeating history.
> > > > >
> > > > > There were already 2 attempts at contributing a pulsar connector,
> > both
> > > > > of which failed because no committer was getting involved, despite
> > the
> > > > > contributor opening a dedicated discussion thread about the
> > contribution
> > > > > beforehand and getting several +1's from committers.
> > > > >
> > > > > We should really make sure that if we welcome/approve such a
> > > > > contribution it will actually get the attention it deserves.
> > > > >
> > > > > As such, I'm inclined to recommend maintaining the connector
> outside
> > of
> > > > > Flink. We could link to it from the documentation to give it more
> > > > exposure.
> > > > > With the upcoming page for sharing artifacts among the community
> > (what's
> > > > > the state of that anyway?), this may be a better option.
> > > > >
> > > > > On 04/09/2019 10:16, Till Rohrmann wrote:
> > > > > > Hi everyone,
> > > > > >
> > > > > > thanks a lot for starting this discussion Yijie. I think the
> Pulsar
> > > > > > connector would be a very valuable addition since Pulsar becomes
> > more
> > > > and
> > > > > > more popular and it would further expand Flink's
> interoperability.
> > Also
> > > > > > from a project perspective it makes sense for me to place the
> > connector
> > > > > in
> > > > > > the downstream project.
> > > > > >
> > > > > > My main concern/question is how can the Flink community maintain
> > the
> > > > > > connector? We have seen in the past that connectors are some of
> the
> > > > most
> > > > > > actively developed components because they need to be kept in
> sync
> > with
> > > > > the
> > > > > > external system and with Flink. Given that the Pulsar community
> is
> > > > > willing
> > > > > > to help with maintaining, improving and evolving the connector,
> I'm
> > > > > > optimistic that we can achieve this. Hence, +1 for contributing
> it
> > back
> > > > > to
> > > > > > Flink.
> > > > > >
> > > > > > Cheers,
> > > > > > Till
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Wed, Sep 4, 2019 at 2:03 AM Sijie Guo <guosi...@gmail.com>
> > wrote:
> > > > > >
> > > > > >> Hi Yun,
> > > > > >>
> > > > > >> Since I was the main driver behind FLINK-9641 and FLINK-9168,
> let
> > me
> > > > > try to
> > > > > >> add more context on this.
> > > > > >>
> > > > > >> FLINK-9641 and FLINK-9168 was created for bringing Pulsar as
> > source
> > > > and
> > > > > >> sink for Flink. The integration was done with Flink 1.6.0. We
> > sent out
> > > > > pull
> > > > > >> requests about a year ago and we ended up maintaining those
> > connectors
> > > > > in
> > > > > >> Pulsar for Pulsar users to use Flink to process event streams in
> > > > Pulsar.
> > > > > >> (See https://github.com/apache/pulsar/tree/master/pulsar-flink
> ).
> > The
> > > > > Flink
> > > > > >> 1.6 integration is pretty simple and there is no schema
> > > > considerations.
> > > > > >>
> > > > > >> In the past year, we have made a lot of changes in Pulsar and
> > brought
> > > > > >> Pulsar schema as the first-class citizen in Pulsar. We also
> > integrated
> > > > > with
> > > > > >> other computing engines for processing Pulsar event streams with
> > > > Pulsar
> > > > > >> schema.
> > > > > >>
> > > > > >> It led us to rethink how to integrate with Flink in the best
> way.
> > Then
> > > > > we
> > > > > >> reimplement the pulsar-flink connectors from the ground up with
> > schema
> > > > > and
> > > > > >> bring table API and catalog API as the first-class citizen in
> the
> > > > > >> integration. With that being said, in the new pulsar-flink
> > > > > implementation,
> > > > > >> you can register pulsar as a flink catalog and query / process
> the
> > > > event
> > > > > >> streams using Flink SQL.
> > > > > >>
> > > > > >> This is an example about how to use Pulsar as a Flink catalog:
> > > > > >>
> > > > > >>
> > > > >
> > > >
> >
> https://github.com/streamnative/pulsar-flink/blob/3eeddec5625fc7dddc3f8a3ec69f72e1614ca9c9/README.md#use-pulsar-catalog
> > > > > >>
> > > > > >> Yijie has also written a blog post explaining why we
> re-implement
> > the
> > > > > flink
> > > > > >> connector with Flink 1.9 and what are the changes we made in the
> > new
> > > > > >> connector:
> > > > > >>
> > > > > >>
> > > > >
> > > >
> >
> https://medium.com/streamnative/use-apache-pulsar-as-streaming-table-with-8-lines-of-code-39033a93947f
> > > > > >>
> > > > > >> We believe Pulsar is not just a simple data sink or source for
> > Flink.
> > > > It
> > > > > >> actually can be a fully integrated streaming data storage for
> > Flink in
> > > > > many
> > > > > >> areas (sink, source, schema/catalog and state). The combination
> of
> > > > Flink
> > > > > >> and Pulsar can create a great streaming warehouse architecture
> for
> > > > > >> streaming-first, unified data processing. Since we are talking
> to
> > > > > >> contribute Pulsar integration to Flink here, we are also
> > dedicated to
> > > > > >> maintain, improve and evolve the integration with Flink to help
> > the
> > > > > users
> > > > > >> who use both Flink and Pulsar.
> > > > > >>
> > > > > >> Hope this give you a bit more background about the pulsar flink
> > > > > >> integration. Let me know what are your thoughts.
> > > > > >>
> > > > > >> Thanks,
> > > > > >> Sijie
> > > > > >>
> > > > > >>
> > > > > >> On Tue, Sep 3, 2019 at 11:54 AM Yun Tang <myas...@live.com>
> > wrote:
> > > > > >>
> > > > > >>> Hi Yijie
> > > > > >>>
> > > > > >>> I can see that Pulsar becomes more and more popular recently
> and
> > very
> > > > > >> glad
> > > > > >>> to see more people willing to contribute to Flink ecosystem.
> > > > > >>>
> > > > > >>> Before any further discussion, would you please give some
> > explanation
> > > > > of
> > > > > >>> the relationship between this thread to current existing JIRAs
> of
> > > > > pulsar
> > > > > >>> source [1] and sink [2] connector? Will the contribution
> contains
> > > > part
> > > > > of
> > > > > >>> those PRs or totally different implementation?
> > > > > >>>
> > > > > >>> [1] https://issues.apache.org/jira/browse/FLINK-9641
> > > > > >>> [2] https://issues.apache.org/jira/browse/FLINK-9168
> > > > > >>>
> > > > > >>> Best
> > > > > >>> Yun Tang
> > > > > >>> ________________________________
> > > > > >>> From: Yijie Shen <henry.yijies...@gmail.com>
> > > > > >>> Sent: Tuesday, September 3, 2019 13:57
> > > > > >>> To: dev@flink.apache.org <dev@flink.apache.org>
> > > > > >>> Subject: [DISCUSS] Contribute Pulsar Flink connector back to
> > Flink
> > > > > >>>
> > > > > >>> Dear Flink Community!
> > > > > >>>
> > > > > >>> I would like to open the discussion of contributing Pulsar
> Flink
> > > > > >>> connector [0] back to Flink.
> > > > > >>>
> > > > > >>> ## A brief introduction to Apache Pulsar
> > > > > >>>
> > > > > >>> Apache Pulsar[1] is a multi-tenant, high-performance
> distributed
> > > > > >>> pub-sub messaging system. Pulsar includes multiple features
> such
> > as
> > > > > >>> native support for multiple clusters in a Pulsar instance, with
> > > > > >>> seamless geo-replication of messages across clusters, very low
> > > > publish
> > > > > >>> and end-to-end latency, seamless scalability to over a million
> > > > topics,
> > > > > >>> and guaranteed message delivery with persistent message storage
> > > > > >>> provided by Apache BookKeeper. Nowadays, Pulsar has been
> adopted
> > by
> > > > > >>> more and more companies[2].
> > > > > >>>
> > > > > >>> ## The status of Pulsar Flink connector
> > > > > >>>
> > > > > >>> The Pulsar Flink connector we are planning to contribute is
> built
> > > > upon
> > > > > >>> Flink 1.9.0 and Pulsar 2.4.0. The main features are:
> > > > > >>> - Pulsar as a streaming source with exactly-once guarantee.
> > > > > >>> - Sink streaming results to Pulsar with at-least-once
> semantics.
> > (We
> > > > > >>> would update this to exactly-once as well when Pulsar gets all
> > > > > >>> transaction features ready in its 2.5.0 version)
> > > > > >>> - Build upon Flink new Table API Type system (FLIP-37[3]), and
> > can
> > > > > >>> automatically (de)serialize messages with the help of Pulsar
> > schema.
> > > > > >>> - Integrate with Flink new Catalog API (FLIP-30[4]), which
> > enables
> > > > the
> > > > > >>> use of Pulsar topics as tables in Table API as well as SQL
> > client.
> > > > > >>>
> > > > > >>> ## Reference
> > > > > >>> [0] https://github.com/streamnative/pulsar-flink
> > > > > >>> [1] https://pulsar.apache.org/
> > > > > >>> [2] https://pulsar.apache.org/en/powered-by/
> > > > > >>> [3]
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-37%3A+Rework+of+the+Table+API+Type+System
> > > > > >>> [4]
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-30%3A+Unified+Catalog+APIs
> > > > > >>>
> > > > > >>> Best,
> > > > > >>> Yijie Shen
> > > > > >>>
> > > > >
> > > > >
> > > >
> >
>

Reply via email to