Re: [DISCUSS] Adding streaming connectors from Apache Flink to Bahir

Robert Metzger Tue, 16 Aug 2016 09:58:24 -0700

Okay. It seems that we've agreed on using different repositories for each
engine.


Luciano, can you create a "bahir-flink" git repository with GitHub
integration?
I'll soon open the first pull request moving an existing connector from
Flink to Bahir.
Also, there's an incoming contribution that I would probably redirect to
Bahir as well.



On Tue, Aug 16, 2016 at 2:30 PM, Ufuk Celebi <[email protected]> wrote:

> Hey all,
>
> great to see this discussion. I'm part of the Flink PMC and would love
> to see some of Flink's connectors added to Bahir. I can also help
> Robert with maintenance on the Flink side of things.
>
> +1 to multiple repo approach
>
> Best,
>
> Ufuk
>
> On Tue, Aug 16, 2016 at 2:27 PM,  <[email protected]> wrote:
> >
> > dev Digest of: thread.362
> >
> >
> > [DISCUSS] Adding streaming connectors from Apache Flink to Bahir
> >         362 by: Robert Metzger
> >         363 by: Steve Loughran
> >         370 by: Luciano Resende
> >         371 by: Robert Metzger
> >         374 by: Luciano Resende
> >         376 by: Ted Yu
> >         377 by: Robert Metzger
> >         380 by: Steve Loughran
> >         381 by: Luciano Resende
> >         382 by: Luciano Resende
> >         384 by: Robert Metzger
> >
> > Administrivia:
> >
> >
> > --- Administrative commands for the dev list ---
> >
> > I can handle administrative requests automatically. Please
> > do not send them to the list address! Instead, send
> > your message to the correct command address:
> >
> > To subscribe to the list, send a message to:
> >    <[email protected]>
> >
> > To remove your address from the list, send a message to:
> >    <[email protected]>
> >
> > Send mail to the following for info and FAQ for this list:
> >    <[email protected]>
> >    <[email protected]>
> >
> > Similar addresses exist for the digest list:
> >    <[email protected]>
> >    <[email protected]>
> >
> > To get messages 123 through 145 (a maximum of 100 per request), mail:
> >    <[email protected]>
> >
> > To get an index with subject and author for messages 123-456 , mail:
> >    <[email protected]>
> >
> > They are always returned as sets of 100, max 2000 per request,
> > so you'll actually get 100-499.
> >
> > To receive all messages with the same subject as message 12345,
> > send a short message to:
> >    <[email protected]>
> >
> > The messages should contain one line or word of text to avoid being
> > treated as sp@m, but I will ignore their content.
> > Only the ADDRESS you send to is important.
> >
> > You can start a subscription for an alternate address,
> > for example "[email protected]", just add a hyphen and your
> > address (with '=' instead of '@') after the command word:
> > <[email protected]>
> >
> > To stop subscription for this address, mail:
> > <[email protected]>
> >
> > In both cases, I'll send a confirmation message to that address. When
> > you receive it, simply reply to it to complete your subscription.
> >
> > If despite following these instructions, you do not get the
> > desired results, please contact my owner at
> > [email protected]. Please be patient, my owner is a
> > lot slower than I am ;-)
> >
> > --- Enclosed is a copy of the request I received.
> >
> > Return-Path: <[email protected]>
> > Received: (qmail 73404 invoked by uid 99); 16 Aug 2016 12:27:00 -0000
> > Received: from mail-relay.apache.org (HELO mail-relay.apache.org)
> (140.211.11.15)
> >     by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Aug 2016 12:27:00
> +0000
> > Received: from mail-oi0-f46.google.com (mail-oi0-f46.google.com
> [209.85.218.46])
> >         by mail-relay.apache.org (ASF Mail Server at
> mail-relay.apache.org) with ESMTPSA id A606D1A0046
> >         for <[email protected]>; Tue, 16 Aug 2016
> 12:27:00 +0000 (UTC)
> > Received: by mail-oi0-f46.google.com with SMTP id c15so96340127oig.0
> >         for <[email protected]>; Tue, 16 Aug 2016
> 05:27:00 -0700 (PDT)
> > X-Gm-Message-State: AEkoousgoDEIM+HCjh+aY7eTsyA74zj2w9Kq4PiayzrgwesoOZ+
> Zww6zKxamSKZTtf5yGMNL9CuRzh7NJTBzQ8V6
> > X-Received: by 10.202.197.3 with SMTP id v3mr18601804oif.131.
> 1471350419968;
> >  Tue, 16 Aug 2016 05:26:59 -0700 (PDT)
> > MIME-Version: 1.0
> > Received: by 10.157.55.181 with HTTP; Tue, 16 Aug 2016 05:26:19 -0700
> (PDT)
> > From: Ufuk Celebi <[email protected]>
> > Date: Tue, 16 Aug 2016 14:26:19 +0200
> > X-Gmail-Original-Message-ID: <CAKiyyaH7h9Njeo+MUaAX2nVoVaHL8B=
> [email protected]>
> > Message-ID: <CAKiyyaH7h9Njeo+MUaAX2nVoVaHL8B=5STdZ+9HV9-
> [email protected]>
> > Subject:
> > To: [email protected]
> > Content-Type: text/plain; charset=UTF-8
> >
> >
> > ----------------------------------------------------------------------
> >
> >
> >
> > ---------- Forwarded message ----------
> > From: Robert Metzger <[email protected]>
> > To: [email protected]
> > Cc:
> > Date: Thu, 11 Aug 2016 10:54:17 +0200
> > Subject: [DISCUSS] Adding streaming connectors from Apache Flink to Bahir
> > Hello Bahir community,
> >
> > The Apache Flink community is currently discussing how to handle incoming
> > (streaming) connector contributions [1].
> > The Flink community wants to limit the maintained connectors to the most
> > popular ones, but we don't want to reject valuable code contributions
> > without offering a good alternative.
> > Among options we are currently discussing is also Apache Bahir.
> > From the Bahir announcement, I got the impression that the project is
> also
> > open to connectors from projects other than Apache Spark.
> >
> > Initially, we would move some of our current connectors here (redis,
> flume,
> > nifi), and there are also some pending contributions in Flink that we
> would
> > redirect to Bahir as well.
> >
> > So what's your opinion on this?
> >
> >
> > Regards,
> > Robert
> >
> >
> > [1]
> > http://mail-archives.apache.org/mod_mbox/flink-dev/201608.
> mbox/%3CCAGr9p8CAN8KQTM6%2B3%2B%3DNv8M3ggYEE9gSqdKaKLQiWsWsKzZ
> 21Q%40mail.gmail.com%3E
> >
> >
> > ---------- Forwarded message ----------
> > From: Steve Loughran <[email protected]>
> > To: [email protected]
> > Cc:
> > Date: Thu, 11 Aug 2016 11:04:26 +0200
> > Subject: Re: [DISCUSS] Adding streaming connectors from Apache Flink to
> Bahir
> > I can see benefits from this —provided we get some help from the Flink
> > people in maintaining and testing the stuff.
> >
> > On 11 August 2016 at 10:54, Robert Metzger <[email protected]> wrote:
> >
> >> Hello Bahir community,
> >>
> >> The Apache Flink community is currently discussing how to handle
> incoming
> >> (streaming) connector contributions [1].
> >> The Flink community wants to limit the maintained connectors to the most
> >> popular ones, but we don't want to reject valuable code contributions
> >> without offering a good alternative.
> >> Among options we are currently discussing is also Apache Bahir.
> >> From the Bahir announcement, I got the impression that the project is
> also
> >> open to connectors from projects other than Apache Spark.
> >>
> >> Initially, we would move some of our current connectors here (redis,
> flume,
> >> nifi), and there are also some pending contributions in Flink that we
> would
> >> redirect to Bahir as well.
> >>
> >> So what's your opinion on this?
> >>
> >>
> >> Regards,
> >> Robert
> >>
> >>
> >> [1]
> >> http://mail-archives.apache.org/mod_mbox/flink-dev/201608.
> >> mbox/%3CCAGr9p8CAN8KQTM6%2B3%2B%3DNv8M3ggYEE9gSqdKaKLQiWsWsKzZ
> >> 21Q%40mail.gmail.com%3E
> >>
> >
> >
> > ---------- Forwarded message ----------
> > From: Luciano Resende <[email protected]>
> > To: [email protected]
> > Cc:
> > Date: Thu, 11 Aug 2016 04:50:12 -0700
> > Subject: Re: [DISCUSS] Adding streaming connectors from Apache Flink to
> Bahir
> > On Thu, Aug 11, 2016 at 2:04 AM, Steve Loughran <[email protected]>
> wrote:
> >
> >> I can see benefits from this —provided we get some help from the Flink
> >> people in maintaining and testing the stuff.
> >>
> >
> > +1, Let me know when you guys are ready and I can create a bahir-flink
> git
> > repository.
> >
> >
> > --
> > Luciano Resende
> > http://twitter.com/lresende1975
> > http://lresende.blogspot.com/
> >
> >
> > ---------- Forwarded message ----------
> > From: Robert Metzger <[email protected]>
> > To: [email protected]
> > Cc:
> > Date: Thu, 11 Aug 2016 14:42:33 +0200
> > Subject: Re: [DISCUSS] Adding streaming connectors from Apache Flink to
> Bahir
> > @Steve: The plan is that Flink committers also help out here with
> > reviewing, releasing and other community activities (but I suspect the
> > activity will be much lower, otherwise, we would not be discussing
> removing
> > some of the connectors from Flink)
> >
> > @Luciano: So the idea is to have separate repositories for each project
> > contributing connectors?
> > I'm wondering if it makes sense to keep the code in the same repository
> to
> > have some synergies (like the release scripts, CI, documentation, a
> common
> > parent pom with rat etc.). Otherwise, it would maybe make more sense to
> > create a Bahir-style project for Flink, to avoid maintaining completely
> > disjunct codebases in the same JIRA, ML, ...
> >
> >
> > On Thu, Aug 11, 2016 at 1:50 PM, Luciano Resende <[email protected]>
> > wrote:
> >
> >> On Thu, Aug 11, 2016 at 2:04 AM, Steve Loughran <[email protected]>
> wrote:
> >>
> >> > I can see benefits from this —provided we get some help from the Flink
> >> > people in maintaining and testing the stuff.
> >> >
> >>
> >> +1, Let me know when you guys are ready and I can create a bahir-flink
> git
> >> repository.
> >>
> >>
> >> --
> >> Luciano Resende
> >> http://twitter.com/lresende1975
> >> http://lresende.blogspot.com/
> >>
> >
> >
> > ---------- Forwarded message ----------
> > From: Luciano Resende <[email protected]>
> > To: [email protected]
> > Cc:
> > Date: Thu, 11 Aug 2016 09:03:39 -0700
> > Subject: Re: [DISCUSS] Adding streaming connectors from Apache Flink to
> Bahir
> > On Thu, Aug 11, 2016 at 5:42 AM, Robert Metzger <[email protected]>
> wrote:
> >
> >>
> >>
> >> @Luciano: So the idea is to have separate repositories for each project
> >> contributing connectors?
> >> I'm wondering if it makes sense to keep the code in the same repository
> to
> >> have some synergies (like the release scripts, CI, documentation, a
> common
> >> parent pom with rat etc.). Otherwise, it would maybe make more sense to
> >> create a Bahir-style project for Flink, to avoid maintaining completely
> >> disjunct codebases in the same JIRA, ML, ...
> >>
> >>
> >>
> > But we most likely would have very different release schedules with the
> > different set of extensions, where Spark extensions will tend to follow
> > Spark release cycles, and Flink release cycles. As for the overhead, I
> > believe release scripts might be the one piece that would be replicated,
> > but I can volunteer the infrastructure overhead for now. All rest, such
> as
> > JIRA, ML, etc will be common. But, anyway, I don't want to make this an
> > issue for Flink to bring up the extensions here, so if you have a strong
> > preference on having all in the same repo, we could start with that.
> >
> > Thoughts ?
> >
> >
> > ---------- Forwarded message ----------
> > From: Ted Yu <[email protected]>
> > To: [email protected]
> > Cc:
> > Date: Thu, 11 Aug 2016 09:13:24 -0700
> > Subject: Re: [DISCUSS] Adding streaming connectors from Apache Flink to
> Bahir
> > Having Flink connectors in the same repo seems to make more sense at the
> > moment.
> >
> > Certain artifacts can be shared between the two types of connectors.
> >
> > Flink seems to have more frequent releases recently. But Bahir doesn't
> have
> > to follow each Flink patch release.
> >
> > Just my two cents.
> >
> > On Thu, Aug 11, 2016 at 9:03 AM, Luciano Resende <[email protected]>
> > wrote:
> >
> >> On Thu, Aug 11, 2016 at 5:42 AM, Robert Metzger <[email protected]>
> >> wrote:
> >>
> >> >
> >> >
> >> > @Luciano: So the idea is to have separate repositories for each
> project
> >> > contributing connectors?
> >> > I'm wondering if it makes sense to keep the code in the same
> repository
> >> to
> >> > have some synergies (like the release scripts, CI, documentation, a
> >> common
> >> > parent pom with rat etc.). Otherwise, it would maybe make more sense
> to
> >> > create a Bahir-style project for Flink, to avoid maintaining
> completely
> >> > disjunct codebases in the same JIRA, ML, ...
> >> >
> >> >
> >> >
> >> But we most likely would have very different release schedules with the
> >> different set of extensions, where Spark extensions will tend to follow
> >> Spark release cycles, and Flink release cycles. As for the overhead, I
> >> believe release scripts might be the one piece that would be replicated,
> >> but I can volunteer the infrastructure overhead for now. All rest, such
> as
> >> JIRA, ML, etc will be common. But, anyway, I don't want to make this an
> >> issue for Flink to bring up the extensions here, so if you have a strong
> >> preference on having all in the same repo, we could start with that.
> >>
> >> Thoughts ?
> >>
> >
> >
> > ---------- Forwarded message ----------
> > From: Robert Metzger <[email protected]>
> > To: [email protected]
> > Cc:
> > Date: Thu, 11 Aug 2016 20:41:00 +0200
> > Subject: Re: [DISCUSS] Adding streaming connectors from Apache Flink to
> Bahir
> > Thank you for your responses.
> >
> > @Luciano: I don't have a strong preference for one of the two options,
> but
> > I would like to understand the implications of the two before we start
> > setting up the infrastructure.
> > Regarding the release cycle: For the Flink connectors, I would actually
> try
> > to make the release cycle dependent on the connectors, not so much on
> Flink
> > itself. In my experience, connectors could benefit from a more frequent
> > release schedule. For example Kafka seems to release new versions quite
> > frequently (recently), or at least the release cycle of Kafka and Flink
> is
> > not aligned ;)
> > So maybe it would make sense for bahir to release independent of the
> engine
> > projects, on a monthly or 2-monthly schedule, with an independent
> > versioning scheme.
> >
> > @Ted: Flink has bugfix releases quite frequently, but major releases are
> at
> > a okay level (3-4 months in between).
> > Since 1.0.0 Flink provides interface stability, so there should not be an
> > issue with independent connector releases.
> >
> >
> >
> > On Thu, Aug 11, 2016 at 6:13 PM, Ted Yu <[email protected]> wrote:
> >
> >> Having Flink connectors in the same repo seems to make more sense at the
> >> moment.
> >>
> >> Certain artifacts can be shared between the two types of connectors.
> >>
> >> Flink seems to have more frequent releases recently. But Bahir doesn't
> have
> >> to follow each Flink patch release.
> >>
> >> Just my two cents.
> >>
> >> On Thu, Aug 11, 2016 at 9:03 AM, Luciano Resende <[email protected]>
> >> wrote:
> >>
> >> > On Thu, Aug 11, 2016 at 5:42 AM, Robert Metzger <[email protected]>
> >> > wrote:
> >> >
> >> > >
> >> > >
> >> > > @Luciano: So the idea is to have separate repositories for each
> project
> >> > > contributing connectors?
> >> > > I'm wondering if it makes sense to keep the code in the same
> repository
> >> > to
> >> > > have some synergies (like the release scripts, CI, documentation, a
> >> > common
> >> > > parent pom with rat etc.). Otherwise, it would maybe make more
> sense to
> >> > > create a Bahir-style project for Flink, to avoid maintaining
> completely
> >> > > disjunct codebases in the same JIRA, ML, ...
> >> > >
> >> > >
> >> > >
> >> > But we most likely would have very different release schedules with
> the
> >> > different set of extensions, where Spark extensions will tend to
> follow
> >> > Spark release cycles, and Flink release cycles. As for the overhead, I
> >> > believe release scripts might be the one piece that would be
> replicated,
> >> > but I can volunteer the infrastructure overhead for now. All rest,
> such
> >> as
> >> > JIRA, ML, etc will be common. But, anyway, I don't want to make this
> an
> >> > issue for Flink to bring up the extensions here, so if you have a
> strong
> >> > preference on having all in the same repo, we could start with that.
> >> >
> >> > Thoughts ?
> >> >
> >>
> >
> >
> > ---------- Forwarded message ----------
> > From: Steve Loughran <[email protected]>
> > To: [email protected]
> > Cc:
> > Date: Thu, 11 Aug 2016 23:18:32 +0200
> > Subject: Re: [DISCUSS] Adding streaming connectors from Apache Flink to
> Bahir
> > Thinking some more
> >
> > To an extent, Bahir is currently mostly a home for some connectors and
> > things which were orphaned by the main spark team, giving them some ASF
> > home. Luciano has been putting in lots of work getting a release out in
> > sync with the spark release.
> >
> > I have some plans to contribute some other things related to spark in
> > there, so again, an ASF home and a test & release process (some YARN
> driver
> > plugins, for ATS integration and another I have a plan to write for YARN
> > registry binding). Again, some stuff unloved by the core spark team.
> >
> > Ideally, Flink should be growing its user/dev base, recruiting everyone
> who
> > wants to get patches in and getting them to work on those JIRAs. That's
> the
> > community growth part of an ASF project. Having some orphan stuff isn't
> > ideal; it's the perennial "contrib" problem of projects.(*)
> >
> > Hadoop had a big purge of contrib stuff in the move to hadoop 2 & maven,
> > though we've been adding stuff in hadoop-tools, especially related to
> > object stores and things. There's now a fairly harsh-but-needed policy
> > there: no contributions which can't be tested during a release. It's a
> PITA
> > as for some code changes I need to test against: AWS S3, Azure, 2x
> > OpenStack endpoints and soon a chinese one. We could have been harsh and
> > said "stay on github" but having it in offers some benefits
> >  -synchronized release schedule (good for Hadoop; bad if the contributors
> > want to release more frequently)
> >  -hadoop team gets some control over what's going on there.
> >  -code review process lets us improve quality; we're getting metrics in
> &c.
> >  -works well with my plan to have an explicit object store API, extending
> > FileSystem with specific and efficient blobstore ops (put(),
> > list(prefix),..)
> >  -enables us to do refactorings across all object stores
> >
> > One thing we do have there which handles object stores/filesystems even
> > outside Hadoop is a set of public compliance tests and a fairly strict
> > specification of what a filesystem is meant to do; it means we can
> handle a
> > big contrib by getting the authors to have those tests working, have
> > regression tests going. But...the bindings do need active engagement to
> > keep alive; openstack has suffered a bit there, and there's now some fork
> > in openstack itself: code follows maintenance; use drives maintenance.
> >
> > Anyway, I digress
> >
> > I've thought about this some more and here are some points
> >
> > -if there's mutual code and/or tests related to flink connectors and the
> > spark ones, there's a very strong case for putting the code into bahir
> > -if it's more that you need a home for things, I'd recommend you start
> with
> > Apache Flink and if there are big contributions that suffer neglect then
> > it'll be time to look for a home
> >
> > in the meantime, maybe bahir artifacts should explicitly indicate that
> they
> > are for spark, eg bahir-spark, so as to leave the option for having,
> say, a
> > bahir-flink artifact at some point in the future.
> >
> >
> >
> >
> > On 11 August 2016 at 14:42, Robert Metzger <[email protected]> wrote:
> >
> >> @Steve: The plan is that Flink committers also help out here with
> >> reviewing, releasing and other community activities (but I suspect the
> >> activity will be much lower, otherwise, we would not be discussing
> removing
> >> some of the connectors from Flink)
> >>
> >> @Luciano: So the idea is to have separate repositories for each project
> >> contributing connectors?
> >> I'm wondering if it makes sense to keep the code in the same repository
> to
> >> have some synergies (like the release scripts, CI, documentation, a
> common
> >> parent pom with rat etc.). Otherwise, it would maybe make more sense to
> >> create a Bahir-style project for Flink, to avoid maintaining completely
> >> disjunct codebases in the same JIRA, ML, ...
> >>
> >>
> >> On Thu, Aug 11, 2016 at 1:50 PM, Luciano Resende <[email protected]>
> >> wrote:
> >>
> >> > On Thu, Aug 11, 2016 at 2:04 AM, Steve Loughran <[email protected]>
> >> wrote:
> >> >
> >> > > I can see benefits from this —provided we get some help from the
> Flink
> >> > > people in maintaining and testing the stuff.
> >> > >
> >> >
> >> > +1, Let me know when you guys are ready and I can create a bahir-flink
> >> git
> >> > repository.
> >> >
> >> >
> >> > --
> >> > Luciano Resende
> >> > http://twitter.com/lresende1975
> >> > http://lresende.blogspot.com/
> >> >
> >>
> >
> >
> > ---------- Forwarded message ----------
> > From: Luciano Resende <[email protected]>
> > To: [email protected]
> > Cc:
> > Date: Fri, 12 Aug 2016 11:28:36 -0700
> > Subject: Re: [DISCUSS] Adding streaming connectors from Apache Flink to
> Bahir
> > On Thu, Aug 11, 2016 at 2:18 PM, Steve Loughran <[email protected]>
> wrote:
> >
> >> Thinking some more
> >>
> >> To an extent, Bahir is currently mostly a home for some connectors and
> >> things which were orphaned by the main spark team, giving them some ASF
> >> home. Luciano has been putting in lots of work getting a release out in
> >> sync with the spark release.
> >>
> >
> > This was what originated Bahir, but we are already starting to see
> original
> > extensions being built by the Bahir community.
> > What we see today is a few distributed analytic platforms that have their
> > focus on build the runtime and maybe a few reference implementation
> > extensions, and then extensions are mostly built by individuals in their
> > own github repositories. Bahir enables these extensions to build a
> > community around it and follow the Apache governance, and it's open for
> non
> > Spark extensions.
> >
> >
> >>
> >> I have some plans to contribute some other things related to spark in
> >> there, so again, an ASF home and a test & release process (some YARN
> driver
> >> plugins, for ATS integration and another I have a plan to write for YARN
> >> registry binding). Again, some stuff unloved by the core spark team.
> >>
> >> Ideally, Flink should be growing its user/dev base, recruiting everyone
> who
> >> wants to get patches in and getting them to work on those JIRAs. That's
> the
> >> community growth part of an ASF project. Having some orphan stuff isn't
> >> ideal; it's the perennial "contrib" problem of projects.(*)
> >>
> >>
> > I don't think that collaborating around Flink extensions in Bahir implies
> > that these extensions are orphans. Bahir can give a lot of flexibility to
> > these extensions, one is release flexibility, where the extensions could
> > follow the extension source release cycle (e.g. Kafka release cycle) or
> the
> > Platform release cycle (e.g. Flink) or both, which is more complicated
> when
> > they are collocated within the Platform code. Another benefit is the
> share
> > of domain expertise, Kafka experts for example could collaborate across
> > extensions on different platforms, etc...
> >
> >
> >> Hadoop had a big purge of contrib stuff in the move to hadoop 2 & maven,
> >> though we've been adding stuff in hadoop-tools, especially related to
> >> object stores and things. There's now a fairly harsh-but-needed policy
> >> there: no contributions which can't be tested during a release. It's a
> PITA
> >> as for some code changes I need to test against: AWS S3, Azure, 2x
> >> OpenStack endpoints and soon a chinese one. We could have been harsh and
> >> said "stay on github" but having it in offers some benefits
> >>  -synchronized release schedule (good for Hadoop; bad if the
> contributors
> >> want to release more frequently)
> >>  -hadoop team gets some control over what's going on there.
> >>  -code review process lets us improve quality; we're getting metrics in
> &c.
> >>  -works well with my plan to have an explicit object store API,
> extending
> >> FileSystem with specific and efficient blobstore ops (put(),
> >> list(prefix),..)
> >>  -enables us to do refactorings across all object stores
> >>
> >> One thing we do have there which handles object stores/filesystems even
> >> outside Hadoop is a set of public compliance tests and a fairly strict
> >> specification of what a filesystem is meant to do; it means we can
> handle a
> >> big contrib by getting the authors to have those tests working, have
> >> regression tests going. But...the bindings do need active engagement to
> >> keep alive; openstack has suffered a bit there, and there's now some
> fork
> >> in openstack itself: code follows maintenance; use drives maintenance.
> >>
> >> Anyway, I digress
> >>
> >> I've thought about this some more and here are some points
> >>
> >> -if there's mutual code and/or tests related to flink connectors and the
> >> spark ones, there's a very strong case for putting the code into bahir
> >>
> >
> > IMHO, even if there isn't, I believe there is still benefits, some of I
> > have described above.
> >
> >
> >> -if it's more that you need a home for things, I'd recommend you start
> with
> >> Apache Flink and if there are big contributions that suffer neglect then
> >> it'll be time to look for a home
> >>
> >>
> > Well, I would say, if you need a more flexible place to host these
> > extensions, Bahir would welcome you.
> >
> > Having said that, we are expecting that the Flink community would be
> > responsible for maintaining these extensions with help of the Bahir
> > community. Note that we also have an defined some guidelines for retiring
> > extensions : http://bahir.apache.org/contributing-extensions/ which
> will be
> > used in case of orphaned code.
> >
> >
> >> in the meantime, maybe bahir artifacts should explicitly indicate that
> they
> >> are for spark, eg bahir-spark, so as to leave the option for having,
> say, a
> >> bahir-flink artifact at some point in the future.
> >>
> >
> > Currently, all artifact ids are prefixed by spark:
> > <artifactId>spark-streaming-akka_2.11</artifactId>
> >
> >
> >
> >>
> >>
> >>
> >> On 11 August 2016 at 14:42, Robert Metzger <[email protected]> wrote:
> >>
> >> > @Steve: The plan is that Flink committers also help out here with
> >> > reviewing, releasing and other community activities (but I suspect the
> >> > activity will be much lower, otherwise, we would not be discussing
> >> removing
> >> > some of the connectors from Flink)
> >> >
> >> > @Luciano: So the idea is to have separate repositories for each
> project
> >> > contributing connectors?
> >> > I'm wondering if it makes sense to keep the code in the same
> repository
> >> to
> >> > have some synergies (like the release scripts, CI, documentation, a
> >> common
> >> > parent pom with rat etc.). Otherwise, it would maybe make more sense
> to
> >> > create a Bahir-style project for Flink, to avoid maintaining
> completely
> >> > disjunct codebases in the same JIRA, ML, ...
> >> >
> >> >
> >> > On Thu, Aug 11, 2016 at 1:50 PM, Luciano Resende <
> [email protected]>
> >> > wrote:
> >> >
> >> > > On Thu, Aug 11, 2016 at 2:04 AM, Steve Loughran <[email protected]>
> >> > wrote:
> >> > >
> >> > > > I can see benefits from this —provided we get some help from the
> >> Flink
> >> > > > people in maintaining and testing the stuff.
> >> > > >
> >> > >
> >> > > +1, Let me know when you guys are ready and I can create a
> bahir-flink
> >> > git
> >> > > repository.
> >> > >
> >> > >
> >> > > --
> >> > > Luciano Resende
> >> > > http://twitter.com/lresende1975
> >> > > http://lresende.blogspot.com/
> >> > >
> >> >
> >>
> >
> >
> >
> > --
> > Luciano Resende
> > http://twitter.com/lresende1975
> > http://lresende.blogspot.com/
> >
> >
> > ---------- Forwarded message ----------
> > From: Luciano Resende <[email protected]>
> > To: [email protected]
> > Cc:
> > Date: Fri, 12 Aug 2016 11:34:25 -0700
> > Subject: Re: [DISCUSS] Adding streaming connectors from Apache Flink to
> Bahir
> > On Thu, Aug 11, 2016 at 9:03 AM, Luciano Resende <[email protected]>
> > wrote:
> >
> >>
> >>
> >> On Thu, Aug 11, 2016 at 5:42 AM, Robert Metzger <[email protected]>
> >> wrote:
> >>
> >>>
> >>>
> >>> @Luciano: So the idea is to have separate repositories for each project
> >>> contributing connectors?
> >>> I'm wondering if it makes sense to keep the code in the same
> repository to
> >>> have some synergies (like the release scripts, CI, documentation, a
> common
> >>> parent pom with rat etc.). Otherwise, it would maybe make more sense to
> >>> create a Bahir-style project for Flink, to avoid maintaining completely
> >>> disjunct codebases in the same JIRA, ML, ...
> >>>
> >>>
> >>>
> >> But we most likely would have very different release schedules with the
> >> different set of extensions, where Spark extensions will tend to follow
> >> Spark release cycles, and Flink release cycles. As for the overhead, I
> >> believe release scripts might be the one piece that would be replicated,
> >> but I can volunteer the infrastructure overhead for now. All rest, such
> as
> >> JIRA, ML, etc will be common. But, anyway, I don't want to make this an
> >> issue for Flink to bring up the extensions here, so if you have a strong
> >> preference on having all in the same repo, we could start with that.
> >>
> >> Thoughts ?
> >>
> >>
> > I have thought more about the question about one combined repository
> versus
> > separate repositories per platform (e.g. Spark, Flink) and the more I
> think
> > I believe two repositories will be the best. Think about some of the
> > benefits listed below :
> >
> > Multiple Repositories:
> > - Enable smaller and fast builds, as you don't have to wait on the other
> > platform extensions
> > - Simplify dependency management when different platforms use different
> > levels of dependencies
> > - Enable for more flexibility on releases, permitting disruptive changes
> in
> > one platform without affecting others
> > - Enable better versioning schema for different platforms (e.g. Spark
> > following the Spark release version schema, while Flink having it's own
> > schema)
> > - etc
> >
> > One Repository
> > - Enable sharing common components (which in my view will be mostly
> > infrastructure pieces that once created are somewhat stable)
> >
> > Thoughts ?
> >
> > --
> > Luciano Resende
> > http://twitter.com/lresende1975
> > http://lresende.blogspot.com/
> >
> >
> > ---------- Forwarded message ----------
> > From: Robert Metzger <[email protected]>
> > To: [email protected]
> > Cc:
> > Date: Mon, 15 Aug 2016 14:04:09 +0200
> > Subject: Re: [DISCUSS] Adding streaming connectors from Apache Flink to
> Bahir
> > Hi,
> >
> > @stevel: Flink is still experiencing a lot of community growth.
> Initially,
> > we accepted all contributions in an acceptable state. Then, we introduced
> > various models of "staging" and "contrib" modules, but by now, the amount
> > of incoming contributions is just too high for the core project.
> > Also, its a bit out of scope compared to the core engine we are building.
> > That's why we started looking at Bahir (and other approaches)
> >
> > @Luciano, I'll answer to the multiple vs one repo discussion inline below
> >
> >
> > On Fri, Aug 12, 2016 at 8:34 PM, Luciano Resende <[email protected]>
> > wrote:
> >
> >> On Thu, Aug 11, 2016 at 9:03 AM, Luciano Resende <[email protected]>
> >> wrote:
> >>
> >> Multiple Repositories:
> >> - Enable smaller and fast builds, as you don't have to wait on the other
> >> platform extensions
> >>
> >
> > True, build time is an argument for multiple repos
> >
> >
> >> - Simplify dependency management when different platforms use different
> >> levels of dependencies
> >>
> >
> > I don't think that the dependencies influence each other much.
> > For the one repository approach, the structure would probably be like
> this:
> >
> > bahir-parent
> > -  bahir-spark
> >      - spark-streaming-akka
> >      - ...
> > - bahir-flink
> >     - flink-connector-redis
> >     - ...
> >
> > In "bahir-parent", we could define all release-related plugins, apache
> rat,
> > checkstyle?, general project information and all the other stuff that
> makes
> > a bahir project "bahir" ;)
> > In the "bahir-<system>" parent, we could define all platform specific
> > dependencies and settings.
> >
> >
> >
> >> - Enable for more flexibility on releases, permitting disruptive
> changes in
> >> one platform without affecting others
> >>
> >
> > With the structure proposed above, I guess we could actually have an
> > independent versioning / releasing for the "bahir-<system>" parent tree.
> >
> >
> >> - Enable better versioning schema for different platforms (e.g. Spark
> >> following the Spark release version schema, while Flink having it's own
> >> schema)
> >> - etc
> >>
> >> One Repository
> >> - Enable sharing common components (which in my view will be mostly
> >> infrastructure pieces that once created are somewhat stable)
> >>
> >>
> >
> > Since you are the project PMC chair, I propose to go for the "multiple
> > repositories" approach if nobody objects within 24 hours?
> >
> > Once we have concluded our discussion here, I'll send a summary to the
> > Flink dev@ list and see what they think about it.
> > I expect them to agree to our proposals, since the "bahir approach" is
> our
> > favorite.
> >
> > Regards,
> > Robert
> >
>

Re: [DISCUSS] Adding streaming connectors from Apache Flink to Bahir

Reply via email to