Re: [DISCUSS] Thinking about Beam 3.x roadmap and release schedule

Ahmet Altay Wed, 29 Nov 2017 16:41:33 -0800

My wishlist for 2018 would be

- Python 3 support
- Python SDK to work with more runners. This is covered in portability in
general. I would like to see an enterprise grade Python SDK that can run on
a range of Beam runners.
- Related to the above item, full streaming support with Python SDK.
- Python SDK to catch-up on missing features. From larger APIs such as
State API to smaller things like setup/teardown support in DoFn.
- Interactive support, perhaps integrations with related projects like
Apache Zeppelin.



On Wed, Nov 29, 2017 at 5:19 AM, Ismaël Mejía <[email protected]> wrote:

> It is good to see so much enthusiasm about the future of Beam
> independently of the fact that we call it Beam 3 or no.
>
> I have some doubts about the idea of a release per month, Apache
> releases are designed to be slow-pace (via the 3-day voting process).
> It is just a question that we have in the same month some holiday
> period + some issues during the release that require two RCs and it
> will easily take two weeks (of course I understand the will to improve
> this considering our not so good statu quo of 6 weeks for the last two
> votes). My point is that a monthly release can bring a ton of extra
> work to validate every release, remember validating a release is not
> just running the unit tests.
>
> I want to add one idea to the wishlist for Beam in the future:
>
> - We need to improve Beam’s monitorability in a unified way even if
> this goes beyond the initial goals of the project because this is a
> big pain point for Beam adopters. We need things like system metrics
> and utilities to monitor what is going on with Beam pipelines in a
> runner-agnostic way.
>
> It would be nice to create JIRAs for the issues discussed in this
> thread (that don’t exist yet) with this we can follow them and
> categorize some sort of roadmap.
>
>
> On Wed, Nov 29, 2017 at 7:05 AM, Romain Manni-Bucau
> <[email protected]> wrote:
> > Ps: forgot another wish: make usable beam sql. Today you need to add a fn
> > before and after cause of that type breakage not consistent with the
> > pipeline API. It would be nice to support pojo (extracted from the select
> > fields or created from "views" like in jackson) bit not having to wrap
> the
> > sql usage in multiple UDF would make it powerful and ready to use.
> >
> > Le 29 nov. 2017 07:01, "Romain Manni-Bucau" <[email protected]> a
> écrit
> > :
> >>
> >> My user wishes - whatever version, it is just a number after all ;):
> >>
> >> - make coder usage simpler and consistent (PCollection TypeDescriptor
> and
> >> Coder are duplicated in term of API)
> >> - have a beam api (split from the sdk and internals and impl)
> >> - have SDF supported by runners
> >> - have a SDFRunner allowing to simulate the SDF lifecycle manually (same
> >> for DoFn short term - see next point for the current issue)
> >> - ensure classloader usage is consistent, ie any proxy is created into
> the
> >> final artifact classloader (transform if custom, dofn/source/sdf
> otherwise)
> >> - have a test compatibility kit (TCK) for runner. It would be a jar any
> >> runner impl can import to run with surefire
> >> - make IO configuration reflection friendly (get rid of the autovalue
> >> pattern which is not industriablizable and allow pojo like classes or
> >> alternatively support reading the conf from properties)
> >> - support pipeline implicit option based on transform names to override
> >> some attributes
> >> - change runner implementations to let the bundle size have a pipeline
> >> option defining an upper bound and not hardcode them arbitrarly -
> defaults
> >> can stay the current ones
> >> - better multi input/output support (just PCollection based and fully
> >> wireable)
> >> - a smoother pipeline API would be nice. I like hazelcast jet one for
> >> instance
> >>
> >> Le 29 nov. 2017 03:29, "Robert Bradshaw" <[email protected]> a écrit
> :
> >>>
> >>> On Tue, Nov 28, 2017 at 9:48 AM, Reuven Lax <[email protected]> wrote:
> >>> >
> >>> > On Tue, Nov 28, 2017 at 9:14 AM, Jean-Baptiste Onofré <
> [email protected]>
> >>> > wrote:
> >>> >>
> >>> >> Hi Reuven,
> >>> >>
> >>> >> Yes, I remember that we agreed on a release per month. However, we
> >>> >> didn't
> >>> >> do it before. I think the most important is not the period, it's
> more
> >>> >> a
> >>> >> stable pace. I think it's more interesting for our community to have
> >>> >> "always" a release every two months, more than a tentative of a
> >>> >> release
> >>> >> every month that end later than that. Of course, if we can do both,
> >>> >> it's
> >>> >> perfect ;)
> >>> >
> >>> > Agree. A stable pace is the most important thing.
> >>>
> >>> +1, and I think everyone who's done a release is in favor of making it
> >>> easier and more frequent. Someone should put together a proposal of
> >>> easy things we can do to automate, etc.
> >>>
> >>> >> For Beam 3.x, I wasn't talking about breaking change, but more about
> >>> >> "marketing" announcement. I think that, even if we don't break API,
> >>> >> some
> >>> >> features are "strong enough" to be "qualified" in a major version.
> >>> >
> >>> > Ah, good point. This doesn't stop us from checking in these new
> >>> > features
> >>> > into 2.x possibly tagged with an @Experimental flag. We can then use
> >>> > 3.0 to
> >>> > announce all these features more broadly, and remove @Experimental
> >>> > tags.
> >>> >
> >>> > I would also like to see enterprise-ready BeamSQL and Java 7
> >>> > deprecation on
> >>> > the list for Beam 3.0
> >>> >
> >>> >>
> >>> >> I think that any major idea & feature (breaking or not the API) are
> >>> >> valuables for Beam 3.x (and it's a good sign for our community again
> >>> >> ;)).
> >>>
> >>> I'm generally not a fan of bumping the major version number just
> >>> because enough time has passed, or enough new features have gone in
> >>> (and am mostly opposed to holding features back just because we want
> >>> to announce them (simultanously?) in a big release)--instead I find
> >>> that the need for a new major version arises out of a realization that
> >>> the model has sufficiently changed and we need to cut ties with the
> >>> old way of doing things (that's perhaps holding us back). That being
> >>> said, it could be that some of these features are large enough to
> >>> merit this.
> >>>
> >>> Regardless of the naming, I think it's a great time to have a
> >>> discussion of where we want to go in 2018.
> >>>
> >>> Top of my list is first class support for Schema'd PCollections (and
> >>> with it SQL support, etc.) and full support of the portability
> >>> framework realizing the possibility of every runner running every SDK
> >>> (and, ideally, even cross-SDK/language pipelines). I would also like
> >>> to see explorations into interactive/incremental (for Python at least,
> >>> but probably Java as well).
> >>>
> >>> - Robert
> >>>
> >>>
> >>> >> On 11/28/2017 06:09 PM, Reuven Lax wrote:
> >>> >>>
> >>> >>>
> >>> >>>
> >>> >>> On Tue, Nov 28, 2017 at 8:55 AM, Jean-Baptiste Onofré
> >>> >>> <[email protected]
> >>> >>> <mailto:[email protected]>> wrote:
> >>> >>>
> >>> >>>     Hi guys,
> >>> >>>
> >>> >>>     Even if there's no rush, I think it would be great for the
> >>> >>> community
> >>> >>> to have
> >>> >>>     a better view on our roadmap and where we are going in term of
> >>> >>> schedule.
> >>> >>>
> >>> >>>     I would like to discuss the following:
> >>> >>>     - a best effort to maintain a good release pace or at least
> >>> >>> provide a
> >>> >>> rough
> >>> >>>     schedule. For instance, in Apache Karaf, I have a release
> >>> >>> schedule
> >>> >>>     (http://karaf.apache.org/download.html#container-schedule
> >>> >>>     <http://karaf.apache.org/download.html#container-schedule>). I
> >>> >>> think
> >>> >>> a
> >>> >>>     release ~ every quarter would be great.
> >>> >>>
> >>> >>>
> >>> >>> Originally we had stated that we wanted monthly releases of Beam.
> So
> >>> >>> far
> >>> >>> the releases have been painful enough that monthly hasn't
> happened. I
> >>> >>> think
> >>> >>> we should address these issues and go to monthly releases as
> >>> >>> originally
> >>> >>> stated.
> >>> >>>
> >>> >>>     - if I see new Beam 2.x releases for sure (according to the
> >>> >>> previous
> >>> >>> point),
> >>> >>>     it would be great to have discussion about Beam 3.x. I think
> that
> >>> >>> one
> >>> >>> of
> >>> >>>     interesting new feature that Beam 3.x can provide is around
> >>> >>> PCollection with
> >>> >>>     Schemas. It's something that we started to discuss with Reuven
> >>> >>> and
> >>> >>> Eugene.
> >>> >>>     In term of schedule,
> >>> >>>
> >>> >>>
> >>> >>> I don't think schemas require Beam 3.0 - I think we can introduce
> >>> >>> them
> >>> >>> without making breaking changes. However there are many other
> >>> >>> features that
> >>> >>> would be very interesting for Beam 3.x, and we should start putting
> >>> >>> together
> >>> >>> a list of them. I
> >>> >>>
> >>> >>>
> >>> >>>     I would love to see your thoughts & ideas about releases
> schedule
> >>> >>> and
> >>> >>> Beam 3.x.
> >>> >>>
> >>> >>>     Regards
> >>> >>>     JB
> >>> >>>     --     Jean-Baptiste Onofré
> >>> >>>     [email protected] <mailto:[email protected]>
> >>> >>>     http://blog.nanthrax.net
> >>> >>>     Talend - http://www.talend.com
> >>> >>>
> >>> >>>
> >>> >>
> >>> >> --
> >>> >> Jean-Baptiste Onofré
> >>> >> [email protected]
> >>> >> http://blog.nanthrax.net
> >>> >> Talend - http://www.talend.com
> >>> >
> >>> >
>

Re: [DISCUSS] Thinking about Beam 3.x roadmap and release schedule

Reply via email to