My wishlist for 2018 would be - Python 3 support - Python SDK to work with more runners. This is covered in portability in general. I would like to see an enterprise grade Python SDK that can run on a range of Beam runners. - Related to the above item, full streaming support with Python SDK. - Python SDK to catch-up on missing features. From larger APIs such as State API to smaller things like setup/teardown support in DoFn. - Interactive support, perhaps integrations with related projects like Apache Zeppelin.
On Wed, Nov 29, 2017 at 5:19 AM, Ismaël Mejía <[email protected]> wrote: > It is good to see so much enthusiasm about the future of Beam > independently of the fact that we call it Beam 3 or no. > > I have some doubts about the idea of a release per month, Apache > releases are designed to be slow-pace (via the 3-day voting process). > It is just a question that we have in the same month some holiday > period + some issues during the release that require two RCs and it > will easily take two weeks (of course I understand the will to improve > this considering our not so good statu quo of 6 weeks for the last two > votes). My point is that a monthly release can bring a ton of extra > work to validate every release, remember validating a release is not > just running the unit tests. > > I want to add one idea to the wishlist for Beam in the future: > > - We need to improve Beam’s monitorability in a unified way even if > this goes beyond the initial goals of the project because this is a > big pain point for Beam adopters. We need things like system metrics > and utilities to monitor what is going on with Beam pipelines in a > runner-agnostic way. > > It would be nice to create JIRAs for the issues discussed in this > thread (that don’t exist yet) with this we can follow them and > categorize some sort of roadmap. > > > On Wed, Nov 29, 2017 at 7:05 AM, Romain Manni-Bucau > <[email protected]> wrote: > > Ps: forgot another wish: make usable beam sql. Today you need to add a fn > > before and after cause of that type breakage not consistent with the > > pipeline API. It would be nice to support pojo (extracted from the select > > fields or created from "views" like in jackson) bit not having to wrap > the > > sql usage in multiple UDF would make it powerful and ready to use. > > > > Le 29 nov. 2017 07:01, "Romain Manni-Bucau" <[email protected]> a > écrit > > : > >> > >> My user wishes - whatever version, it is just a number after all ;): > >> > >> - make coder usage simpler and consistent (PCollection TypeDescriptor > and > >> Coder are duplicated in term of API) > >> - have a beam api (split from the sdk and internals and impl) > >> - have SDF supported by runners > >> - have a SDFRunner allowing to simulate the SDF lifecycle manually (same > >> for DoFn short term - see next point for the current issue) > >> - ensure classloader usage is consistent, ie any proxy is created into > the > >> final artifact classloader (transform if custom, dofn/source/sdf > otherwise) > >> - have a test compatibility kit (TCK) for runner. It would be a jar any > >> runner impl can import to run with surefire > >> - make IO configuration reflection friendly (get rid of the autovalue > >> pattern which is not industriablizable and allow pojo like classes or > >> alternatively support reading the conf from properties) > >> - support pipeline implicit option based on transform names to override > >> some attributes > >> - change runner implementations to let the bundle size have a pipeline > >> option defining an upper bound and not hardcode them arbitrarly - > defaults > >> can stay the current ones > >> - better multi input/output support (just PCollection based and fully > >> wireable) > >> - a smoother pipeline API would be nice. I like hazelcast jet one for > >> instance > >> > >> Le 29 nov. 2017 03:29, "Robert Bradshaw" <[email protected]> a écrit > : > >>> > >>> On Tue, Nov 28, 2017 at 9:48 AM, Reuven Lax <[email protected]> wrote: > >>> > > >>> > On Tue, Nov 28, 2017 at 9:14 AM, Jean-Baptiste Onofré < > [email protected]> > >>> > wrote: > >>> >> > >>> >> Hi Reuven, > >>> >> > >>> >> Yes, I remember that we agreed on a release per month. However, we > >>> >> didn't > >>> >> do it before. I think the most important is not the period, it's > more > >>> >> a > >>> >> stable pace. I think it's more interesting for our community to have > >>> >> "always" a release every two months, more than a tentative of a > >>> >> release > >>> >> every month that end later than that. Of course, if we can do both, > >>> >> it's > >>> >> perfect ;) > >>> > > >>> > Agree. A stable pace is the most important thing. > >>> > >>> +1, and I think everyone who's done a release is in favor of making it > >>> easier and more frequent. Someone should put together a proposal of > >>> easy things we can do to automate, etc. > >>> > >>> >> For Beam 3.x, I wasn't talking about breaking change, but more about > >>> >> "marketing" announcement. I think that, even if we don't break API, > >>> >> some > >>> >> features are "strong enough" to be "qualified" in a major version. > >>> > > >>> > Ah, good point. This doesn't stop us from checking in these new > >>> > features > >>> > into 2.x possibly tagged with an @Experimental flag. We can then use > >>> > 3.0 to > >>> > announce all these features more broadly, and remove @Experimental > >>> > tags. > >>> > > >>> > I would also like to see enterprise-ready BeamSQL and Java 7 > >>> > deprecation on > >>> > the list for Beam 3.0 > >>> > > >>> >> > >>> >> I think that any major idea & feature (breaking or not the API) are > >>> >> valuables for Beam 3.x (and it's a good sign for our community again > >>> >> ;)). > >>> > >>> I'm generally not a fan of bumping the major version number just > >>> because enough time has passed, or enough new features have gone in > >>> (and am mostly opposed to holding features back just because we want > >>> to announce them (simultanously?) in a big release)--instead I find > >>> that the need for a new major version arises out of a realization that > >>> the model has sufficiently changed and we need to cut ties with the > >>> old way of doing things (that's perhaps holding us back). That being > >>> said, it could be that some of these features are large enough to > >>> merit this. > >>> > >>> Regardless of the naming, I think it's a great time to have a > >>> discussion of where we want to go in 2018. > >>> > >>> Top of my list is first class support for Schema'd PCollections (and > >>> with it SQL support, etc.) and full support of the portability > >>> framework realizing the possibility of every runner running every SDK > >>> (and, ideally, even cross-SDK/language pipelines). I would also like > >>> to see explorations into interactive/incremental (for Python at least, > >>> but probably Java as well). > >>> > >>> - Robert > >>> > >>> > >>> >> On 11/28/2017 06:09 PM, Reuven Lax wrote: > >>> >>> > >>> >>> > >>> >>> > >>> >>> On Tue, Nov 28, 2017 at 8:55 AM, Jean-Baptiste Onofré > >>> >>> <[email protected] > >>> >>> <mailto:[email protected]>> wrote: > >>> >>> > >>> >>> Hi guys, > >>> >>> > >>> >>> Even if there's no rush, I think it would be great for the > >>> >>> community > >>> >>> to have > >>> >>> a better view on our roadmap and where we are going in term of > >>> >>> schedule. > >>> >>> > >>> >>> I would like to discuss the following: > >>> >>> - a best effort to maintain a good release pace or at least > >>> >>> provide a > >>> >>> rough > >>> >>> schedule. For instance, in Apache Karaf, I have a release > >>> >>> schedule > >>> >>> (http://karaf.apache.org/download.html#container-schedule > >>> >>> <http://karaf.apache.org/download.html#container-schedule>). I > >>> >>> think > >>> >>> a > >>> >>> release ~ every quarter would be great. > >>> >>> > >>> >>> > >>> >>> Originally we had stated that we wanted monthly releases of Beam. > So > >>> >>> far > >>> >>> the releases have been painful enough that monthly hasn't > happened. I > >>> >>> think > >>> >>> we should address these issues and go to monthly releases as > >>> >>> originally > >>> >>> stated. > >>> >>> > >>> >>> - if I see new Beam 2.x releases for sure (according to the > >>> >>> previous > >>> >>> point), > >>> >>> it would be great to have discussion about Beam 3.x. I think > that > >>> >>> one > >>> >>> of > >>> >>> interesting new feature that Beam 3.x can provide is around > >>> >>> PCollection with > >>> >>> Schemas. It's something that we started to discuss with Reuven > >>> >>> and > >>> >>> Eugene. > >>> >>> In term of schedule, > >>> >>> > >>> >>> > >>> >>> I don't think schemas require Beam 3.0 - I think we can introduce > >>> >>> them > >>> >>> without making breaking changes. However there are many other > >>> >>> features that > >>> >>> would be very interesting for Beam 3.x, and we should start putting > >>> >>> together > >>> >>> a list of them. I > >>> >>> > >>> >>> > >>> >>> I would love to see your thoughts & ideas about releases > schedule > >>> >>> and > >>> >>> Beam 3.x. > >>> >>> > >>> >>> Regards > >>> >>> JB > >>> >>> -- Jean-Baptiste Onofré > >>> >>> [email protected] <mailto:[email protected]> > >>> >>> http://blog.nanthrax.net > >>> >>> Talend - http://www.talend.com > >>> >>> > >>> >>> > >>> >> > >>> >> -- > >>> >> Jean-Baptiste Onofré > >>> >> [email protected] > >>> >> http://blog.nanthrax.net > >>> >> Talend - http://www.talend.com > >>> > > >>> > >
