Hi Neville, I don't know how up to date this roadmap is but from "Apache Beam: Technical Vision": https://docs.google.com/presentation/d/1E9seGPB_VXtY_KZP4HngDPTbsu5RVZFFaTlwEYa88Zw/edit#slide=id.g108d3a202f_0_287
And for more details: https://docs.google.com/document/d/1UyAeugHxZmVlQ5cEWo_eOPgXNQA1oD-rGooWOSwAqh8/edit#heading=h.ywcvt1a9xcx1 On 26 March 2016 at 06:53, Jean-Baptiste Onofré <[email protected]> wrote: > Hi Neville, > > that's great news, and the timeline is perfect ! > > We are working on some refactoring & polishing on our side (Runner API, > etc). So, one or two months is not a big deal ! > > Let me know if I can help in any way. > > Thanks, > Regards > JB > > > On 03/25/2016 08:03 PM, Neville Li wrote: > >> Thanks guys. Yes we'd love to donate the project but would also like to >> polish the API a bit first, like in the next month or two. What's the >> timeline like for BEAM and related projects? >> >> Will also read the technical docs and follow up later. >> >> On Fri, Mar 25, 2016, 12:55 AM Ismaël Mejía <[email protected]> wrote: >> >> Hello Neville, >>> >>> First congratulations guys, excellent job / API, the scalding touches are >>> pretty neat (as well as the Tap abstraction). I am also new to Beam, so >>> believe me, you guys already know more than me. >>> >>> In my comment I mentioned sessions referring to session windows, but it >>> was >>> my mistake since I just took a fast look at your code and initially >>> didn't >>> see them. Anyway if you are interested in the model there is a good >>> description of the current capabilities of the runners in the website, >>> >>> https://beam.incubator.apache.org/capability-matrix/ >>> >>> And the new additions to the model are openly discussed in the mailing >>> list >>> and in the technical docs (e.g. lateness): >>> >>> https://goo.gl/ps8twC >>> >>> -Ismaël >>> >>> On Fri, Mar 25, 2016 at 8:36 AM, Neville Li <[email protected]> >>> wrote: >>> >>> Thanks guys for the interest. I'm really excited about all the feedbacks >>>> from the community. >>>> >>>> A little background: we developed Scio to bring Google Cloud Dataflow >>>> closer to the Scalding/Spark ecosystem that our developers are familiar >>>> with while bringing some missing pieces to the table (type safe >>>> BigQuery, >>>> HDFS, REPL to name a few). >>>> >>>> I have to admit that I'm pretty new to the BEAM development but would >>>> >>> love >>> >>>> to get feedbacks and advices on how to bring Scio closer to BEAM feature >>>> set and semantics. Scio doesn't have to live with the BEAM code base >>>> just >>>> yet (we're still under heavy development) but I'd like to see it as a de >>>> facto Scala API endorsed by the BEAM community. >>>> >>>> @Ismaël: I'm curious what's this session thing you're referring to? >>>> >>>> On Thu, Mar 24, 2016 at 3:40 PM Frances Perry <[email protected]> >>>> wrote: >>>> >>>> +Neville and Rafal for their take ;-) >>>>> >>>>> Excited to see this out. Multiple community driven SDKs are right in >>>>> >>>> line >>> >>>> with our goals for Beam. >>>>> >>>>> >>>>> On Thu, Mar 24, 2016 at 3:04 PM, Ismaël Mejía <[email protected]> >>>>> >>>> wrote: >>> >>>> >>>>> Addendum: actually the semantic model support is not so far away as I >>>>>> >>>>> said >>>>> >>>>>> before (I havent finished reading and I thought they didn't support >>>>>> sessions), and looking at the git history the project is not so young >>>>>> either and it is quite active. >>>>>> >>>>>> On Thu, Mar 24, 2016 at 10:52 PM, Ismaël Mejía <[email protected]> >>>>>> >>>>> wrote: >>>>> >>>>>> >>>>>> Hello, >>>>>>> >>>>>>> I just checked a bit the code and what they have done is >>>>>>> >>>>>> interesting, >>> >>>> the >>>>> >>>>>> SCollection wrapper is worth a look, as well as the examples to get >>>>>>> >>>>>> an >>>> >>>>> idea >>>>>> >>>>>>> of their intentions, the fact that the code looks so spark-lish >>>>>>> (distributed collections like) is something that is quite >>>>>>> >>>>>> interesting >>> >>>> too: >>>>>> >>>>>>> >>>>>>> val (sc, args) = ContextAndArgs(cmdlineArgs) >>>>>>> sc.textFile(args.getOrElse("input", ExampleData.KING_LEAR)) >>>>>>> .flatMap(_.split("[^a-zA-Z']+").filter(_.nonEmpty)) >>>>>>> .countByValue() >>>>>>> .map(t => t._1 + ": " + t._2) >>>>>>> .saveAsTextFile(args("output")) >>>>>>> sc.close() >>>>>>> >>>>>>> They have a repl, and since the project is a bit young they don't >>>>>>> >>>>>> support >>>>> >>>>>> all the advanced semantics of Beam, They also have a Hadoop File >>>>>>> Sink/Source. I think it would be nice to work with them, but if it >>>>>>> >>>>>> is >>> >>>> not >>>>> >>>>>> possible, at least I think it is worth to coordinate some sharing >>>>>>> >>>>>> e.g. >>>> >>>>> in >>>>> >>>>>> the Sink/Source area + other extensions. >>>>>>> >>>>>>> Aditionally their code is also under the Apache license. >>>>>>> >>>>>>> >>>>>>> On Thu, Mar 24, 2016 at 9:20 PM, Jean-Baptiste Onofré < >>>>>>> >>>>>> [email protected] >>>> >>>>> >>>>>> wrote: >>>>>>> >>>>>>> Hi Raghu, >>>>>>>> >>>>>>>> I agree: we should provide SDK in different languages, and DSLs >>>>>>>> >>>>>>> for >>> >>>> specific use cases. >>>>>>>> >>>>>>>> You got why I sent my proposal ;) >>>>>>>> >>>>>>>> Regards >>>>>>>> JB >>>>>>>> >>>>>>>> >>>>>>>> On 03/24/2016 07:14 PM, Raghu Angadi wrote: >>>>>>>> >>>>>>>> I would love to see Scala API properly supported. I didn't know >>>>>>>>> >>>>>>>> about >>>> >>>>> scio. >>>>>>>>> Scala is such a natural fit for Dataflow API. >>>>>>>>> >>>>>>>>> I am not sure of the policy w.r.t where such packages would live >>>>>>>>> >>>>>>>> in >>> >>>> Beam >>>>>> >>>>>>> repo, but I personally would write my Dataflow applications in >>>>>>>>> >>>>>>>> Scala. >>>> >>>>> It >>>>>> >>>>>>> is >>>>>>>>> probably already the case but my request would be : it should be >>>>>>>>> >>>>>>>> as >>> >>>> thin >>>>>> >>>>>>> as >>>>>>>>> reasonably possible (that might make it a bit less like >>>>>>>>> >>>>>>>> scalding/spark >>>>> >>>>>> API >>>>>>>>> in some cases, which I think is a good compromise). >>>>>>>>> >>>>>>>>> On Thu, Mar 24, 2016 at 6:01 AM, Jean-Baptiste Onofré < >>>>>>>>> >>>>>>>> [email protected] >>>>> >>>>>> >>>>>>> wrote: >>>>>>>>> >>>>>>>>> Hi beamers, >>>>>>>>> >>>>>>>>>> >>>>>>>>>> right now, Beam provides Java SDK. >>>>>>>>>> >>>>>>>>>> AFAIK, very soon, you should have the Python SDK ;) >>>>>>>>>> >>>>>>>>>> Spotify created a Scala API on top of Google Dataflow SDK: >>>>>>>>>> >>>>>>>>>> https://github.com/spotify/scio >>>>>>>>>> >>>>>>>>>> What do you think of asking if they want to donate this as Beam >>>>>>>>>> >>>>>>>>> Scala >>>>> >>>>>> SDK ? >>>>>>>>>> I planned to work on a Scala SDK, but as it seems there's >>>>>>>>>> >>>>>>>>> already >>> >>>> something, it makes sense to leverage it. >>>>>>>>>> >>>>>>>>>> Thoughts ? >>>>>>>>>> >>>>>>>>>> Regards >>>>>>>>>> JB >>>>>>>>>> -- >>>>>>>>>> Jean-Baptiste Onofré >>>>>>>>>> [email protected] >>>>>>>>>> http://blog.nanthrax.net >>>>>>>>>> Talend - http://www.talend.com >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> -- >>>>>>>> Jean-Baptiste Onofré >>>>>>>> [email protected] >>>>>>>> http://blog.nanthrax.net >>>>>>>> Talend - http://www.talend.com >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > -- > Jean-Baptiste Onofré > [email protected] > http://blog.nanthrax.net > Talend - http://www.talend.com >
