+1 here. I already liked Euphoria, and I like the merger even more :-) Kenn
On Tue, Jan 2, 2018 at 8:45 AM, Tyler Akidau <taki...@google.com> wrote: > +1, I'm supportive of seeing this move forward. What remaining concrete > concerns are there? > > -Tyler > > > On Tue, Jan 2, 2018 at 8:35 AM David Morávek <david.mora...@gmail.com> > wrote: > >> Hello JB, >> >> can we help in any way to move things forward? >> >> Thanks, >> D. >> >> On Mon, Dec 18, 2017 at 4:28 PM, Jean-Baptiste Onofré <j...@nanthrax.net> >> wrote: >> >>> Thanks Jan, >>> >>> It makes sense. >>> >>> Let me take a look on the code to understand the "interaction". >>> >>> Regards >>> JB >>> >>> >>> On 12/18/2017 04:26 PM, Jan Lukavský wrote: >>> >>>> Hi JB, >>>> >>>> basically you are not wrong. The project started about three or four >>>> years ago with a goal to unify batch and streaming processing into single >>>> portable, executor independent API. Because of that, it is currently >>>> "close" to Beam in this sense. But we don't see much added value keeping >>>> this as a separate project, with one of the key differences to be the API >>>> (not the model itself), so we would like to focus on translation from >>>> Euphoria API to Beam's SDK. That's why we would like to see it as a DSL, so >>>> that it would be possible to use Euphoria API with Beam's runners as much >>>> natively as possible. >>>> >>>> I hope I didn't make the subject even more unclear, if so, I'll be >>>> happy to explain anything in more detail. :-) >>>> >>>> Jan >>>> >>>> >>>> On 12/18/2017 04:08 PM, Jean-Baptiste Onofré wrote: >>>> >>>>> Hi Jan, >>>>> >>>>> Thanks for your answers. >>>>> >>>>> However, they confused me ;) >>>>> >>>>> Regarding what you replied, Euphoria seems like a programming >>>>> model/SDK "close" to Beam more than a DSL on top of an existing Beam SDK. >>>>> >>>>> Am I wrong ? >>>>> >>>>> Regards >>>>> JB >>>>> >>>>> On 12/18/2017 03:44 PM, Jan Lukavský wrote: >>>>> >>>>>> Hi Ismael, >>>>>> >>>>>> basically we adopted the Beam's design regarding partitioning ( >>>>>> https://github.com/seznam/euphoria/issues/160) and implemented the >>>>>> sorting manually (https://github.com/seznam/euphoria/issues/158). >>>>>> I'm not aware of the time model differences (Euphoria supports ingestion >>>>>> and event time, we don't support processing time by decision). Regarding >>>>>> other differences (looking into Beam capability matrix, I'd say that): >>>>>> >>>>>> - we don't support stateful FlatMap (i.e. ParDo) for now ( >>>>>> https://github.com/seznam/euphoria/issues/192) >>>>>> >>>>>> - we don't support side inputs (by decision now, but might be >>>>>> reconsidered) and outputs (https://github.com/seznam/ >>>>>> euphoria/issues/124) >>>>>> >>>>>> - we support complete event-time windows (non-merging, merging, >>>>>> aligned, unaligned) and time control >>>>>> >>>>>> - we don't support processing time by decision (might be >>>>>> reconsidered if a valid use-case is found) >>>>>> >>>>>> - we support window triggering based on both time and data, >>>>>> including discarding and accumulating (without accumulating & retracting) >>>>>> >>>>>> All our executors (runners) - Flink, Spark and Local - implement the >>>>>> complete model, which we enforce using "operator test kit" that all >>>>>> executors must pass. Spark executor supports bounded sources only (for >>>>>> now). As David said, we currently don't have serialization abstraction, >>>>>> so >>>>>> there is some work to be done in that regard. >>>>>> >>>>>> Our intention is to completely supersede Euphoria, we would like to >>>>>> consider possibility to use executors that would not rely on Beam, but >>>>>> that >>>>>> is optional now and should be straightforward. >>>>>> >>>>>> We'd be happy to answer any more questions you might have and thanks >>>>>> a lot! >>>>>> >>>>>> Best, >>>>>> >>>>>> Jan >>>>>> >>>>>> >>>>>> On 12/18/2017 03:19 PM, Ismaël Mejía wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> It is great to see that you guys have achieved a maturity point to >>>>>>> propose this. Congratulations for your work and the idea to >>>>>>> contribute >>>>>>> it into Beam. >>>>>>> >>>>>>> I remember from a previous discussion with Jan about the model >>>>>>> mismatch between Euphoria and Beam, because of some design decisions >>>>>>> of both projects. I remember you guys had some issues with the way >>>>>>> Beam's sources do partitioning, as well as Beam's lack of sorted data >>>>>>> (on shuffle a la hadoop). Also if I remember well the 'time' model of >>>>>>> Euphoria was simpler than Beam's. I talk about all of this because I >>>>>>> am curious about what parts of the Euphoria model you guys had to >>>>>>> sacrifice to support Beam, and what parts of Beam's model should >>>>>>> still >>>>>>> be integrated into Euphoria (and if there is a straightforward path >>>>>>> to >>>>>>> do it). >>>>>>> >>>>>>> If I understand well if this gets merged into Apache this means that >>>>>>> Euphoria's current implementation would be superseded by this DSL? I >>>>>>> am curious because I would like to understand your level of >>>>>>> investment >>>>>>> on supporting the future of this DSL. >>>>>>> >>>>>>> Thanks and congrats again ! >>>>>>> Ismaël >>>>>>> >>>>>>> On Mon, Dec 18, 2017 at 10:12 AM, Jean-Baptiste Onofré < >>>>>>> j...@nanthrax.net> wrote: >>>>>>> >>>>>>>> Depending of the donation, you would need ICLA for each >>>>>>>> contributor, and >>>>>>>> CCLA in addition of SGA. >>>>>>>> >>>>>>>> We can sync with Davor and I for the legal stuff. >>>>>>>> However, I would wait a little bit just to have feedback from the >>>>>>>> whole team >>>>>>>> and start a formal vote. >>>>>>>> >>>>>>>> I would be happy to start the formal vote. >>>>>>>> >>>>>>>> Regards >>>>>>>> JB >>>>>>>> >>>>>>>> On 12/18/2017 10:03 AM, David Morávek wrote: >>>>>>>> >>>>>>>>> Hello, >>>>>>>>> >>>>>>>>> Thanks for the awesome feedback! >>>>>>>>> >>>>>>>>> Romain: >>>>>>>>> >>>>>>>>> We already use Java Stream API in all operators where it makes >>>>>>>>> sense (eg.: >>>>>>>>> ReduceByKey). Still not sure if it was a good choice, but i can be >>>>>>>>> easily >>>>>>>>> converted to iterator anyway. >>>>>>>>> >>>>>>>>> Side outputs support is coming soon, we already made an initial >>>>>>>>> work on >>>>>>>>> this. >>>>>>>>> >>>>>>>>> Side inputs are not supported in a way you are used to from beam, >>>>>>>>> because >>>>>>>>> it can be replaced by Join operator on the same key (if annotated >>>>>>>>> with >>>>>>>>> broadcastHashJoin, it will be turned into map side join). >>>>>>>>> >>>>>>>>> Only significant difference from Beam is, that we decided not to >>>>>>>>> abstract >>>>>>>>> serialization, so we need to add support for Type Hints, because >>>>>>>>> of type >>>>>>>>> erasure. >>>>>>>>> >>>>>>>>> Fluent API: >>>>>>>>> >>>>>>>>> API is fluent within one operator. It is designed to "lead the >>>>>>>>> programmer", which means, that he we'll be only offered methods >>>>>>>>> that makes >>>>>>>>> sense after the last method he used (eg.: in ReduceByKey, we know >>>>>>>>> that after >>>>>>>>> keyBy either reduceBy method should come). It is implemented as a >>>>>>>>> series of >>>>>>>>> builders. >>>>>>>>> >>>>>>>>> Davor: >>>>>>>>> >>>>>>>>> Thanks, I'll contact you, and will start the process of having all >>>>>>>>> the >>>>>>>>> necessary paperwork signed on our side, so we can get things >>>>>>>>> moving. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Mon, Dec 18, 2017 at 7:46 AM, Romain Manni-Bucau < >>>>>>>>> rmannibu...@gmail.com >>>>>>>>> <mailto:rmannibu...@gmail.com>> wrote: >>>>>>>>> >>>>>>>>> Hi guys >>>>>>>>> >>>>>>>>> A DSL would be very welcomed, in particular if fluent. >>>>>>>>> >>>>>>>>> Open question: did you study to implement Stream API (surely >>>>>>>>> extending >>>>>>>>> it to >>>>>>>>> have a BeamStream and a few more features like sides etc)? >>>>>>>>> Would be >>>>>>>>> very >>>>>>>>> natural and integrable easily anywhere and avoid a new API >>>>>>>>> discovery. >>>>>>>>> >>>>>>>>> Hazelcast jet did it so I dont see why Beam couldnt. >>>>>>>>> >>>>>>>>> Le 18 déc. 2017 07:26, "Davor Bonaci" <da...@apache.org >>>>>>>>> <mailto:da...@apache.org>> a écrit : >>>>>>>>> >>>>>>>>> Hi David, >>>>>>>>> As JB noted, merging of these two projects is a great >>>>>>>>> idea. If >>>>>>>>> fact, >>>>>>>>> some of us have had those discussions in the past. >>>>>>>>> >>>>>>>>> Legally, nothing particular is strictly necessary as the >>>>>>>>> code seem >>>>>>>>> to >>>>>>>>> already be Apache 2.0 licensed. We don't, however, want >>>>>>>>> to be >>>>>>>>> perceived >>>>>>>>> as making hostile forks, so it would be great to file a >>>>>>>>> Software >>>>>>>>> Grant >>>>>>>>> Agreement with the ASF Secretary. I can help with the >>>>>>>>> process, as >>>>>>>>> necessary. >>>>>>>>> >>>>>>>>> Project alignment-wise, there aren't any particular >>>>>>>>> blockers that >>>>>>>>> I am >>>>>>>>> aware of. We welcome DSLs. >>>>>>>>> >>>>>>>>> Technically, the code would start in a feature branch. >>>>>>>>> During this >>>>>>>>> stage, we'd need to validate a few things, including >>>>>>>>> confirmation >>>>>>>>> the >>>>>>>>> code and dependencies match the ASF policy, automate >>>>>>>>> testing in >>>>>>>>> Beam's >>>>>>>>> tooling, etc. At that point, we'd take a community vote >>>>>>>>> to accept >>>>>>>>> the >>>>>>>>> component into master, and consider author(s) for >>>>>>>>> committership in >>>>>>>>> the >>>>>>>>> overall project. >>>>>>>>> >>>>>>>>> Welcome to the ASF and Beam -- we are thrilled to have >>>>>>>>> you! Hope >>>>>>>>> this >>>>>>>>> helps, and please reach out if anybody on our end can >>>>>>>>> help, >>>>>>>>> including JB >>>>>>>>> or myself. >>>>>>>>> >>>>>>>>> Davor >>>>>>>>> >>>>>>>>> >>>>>>>>> On Sun, Dec 17, 2017 at 10:13 AM, Jean-Baptiste Onofré >>>>>>>>> <j...@nanthrax.net >>>>>>>>> <mailto:j...@nanthrax.net>> wrote: >>>>>>>>> >>>>>>>>> Hi David, >>>>>>>>> >>>>>>>>> Generally speaking, having different fluent DSL on >>>>>>>>> top of the >>>>>>>>> Beam >>>>>>>>> SDK is great. >>>>>>>>> >>>>>>>>> I would like to take a look on your wordcount >>>>>>>>> examples to give >>>>>>>>> you a >>>>>>>>> complete feedback. I like the idea and a fluent Java >>>>>>>>> DSL is >>>>>>>>> valuable. >>>>>>>>> >>>>>>>>> Let's wait feedback from others. If we have a >>>>>>>>> consensus, then >>>>>>>>> I >>>>>>>>> would be more than happy to help you for the donation >>>>>>>>> (I >>>>>>>>> worked on >>>>>>>>> the Camel Java DSL while ago, so I have some >>>>>>>>> experience here). >>>>>>>>> >>>>>>>>> Thanks ! >>>>>>>>> Regards >>>>>>>>> JB >>>>>>>>> >>>>>>>>> On 12/17/2017 07:00 PM, David Morávek wrote: >>>>>>>>> >>>>>>>>> Hello, >>>>>>>>> >>>>>>>>> >>>>>>>>> First of all, thanks for the amazing work the >>>>>>>>> Apache Beam >>>>>>>>> community is doing! >>>>>>>>> >>>>>>>>> >>>>>>>>> In 2014, we've started development of the runtime >>>>>>>>> independent >>>>>>>>> Java 8 API, that helps us to create unified >>>>>>>>> big-data >>>>>>>>> processing >>>>>>>>> flows. It has been used as a core building block >>>>>>>>> of >>>>>>>>> Seznam.cz >>>>>>>>> web crawler data infrastructure every since. Its >>>>>>>>> design >>>>>>>>> principles and execution model are very similar >>>>>>>>> to Apache >>>>>>>>> Beam. >>>>>>>>> >>>>>>>>> >>>>>>>>> This API was open sourced in 2016, under the name >>>>>>>>> Euphoria >>>>>>>>> API: >>>>>>>>> >>>>>>>>> https://github.com/seznam/euphoria >>>>>>>>> <https://github.com/seznam/euphoria> >>>>>>>>> >>>>>>>>> >>>>>>>>> As it is very similar to Apache Beam, we feel, >>>>>>>>> that it is >>>>>>>>> not >>>>>>>>> worth of duplicating effort in terms of >>>>>>>>> development of new >>>>>>>>> runtimes and fine-tuning of current ones. >>>>>>>>> >>>>>>>>> >>>>>>>>> The main blocker for us to switch to Apache Beam >>>>>>>>> is lack >>>>>>>>> of the >>>>>>>>> Java 8 API. *W*e propose the integration of >>>>>>>>> Euphoria API >>>>>>>>> into >>>>>>>>> Apache Beam as a Java 8 DSL, in order to share >>>>>>>>> our effort >>>>>>>>> with >>>>>>>>> the community. >>>>>>>>> >>>>>>>>> >>>>>>>>> Simple example of the Euphoria API usage, can be >>>>>>>>> found >>>>>>>>> here: >>>>>>>>> >>>>>>>>> >>>>>>>>> https://github.com/seznam/euphoria/tree/master/euphoria- >>>>>>>>> examples/src/main/java/cz/seznam/euphoria/examples/wordcount >>>>>>>>> >>>>>>>>> <https://github.com/seznam/euphoria/tree/master/euphoria- >>>>>>>>> examples/src/main/java/cz/seznam/euphoria/examples/wordcount> >>>>>>>>> >>>>>>>>> >>>>>>>>> If you feel, that Beam community could leverage >>>>>>>>> from our >>>>>>>>> work, >>>>>>>>> we would love to start working on Euphoria >>>>>>>>> integration >>>>>>>>> into >>>>>>>>> Apache Beam (we already have a working POC, with >>>>>>>>> few basic >>>>>>>>> operators implemented). >>>>>>>>> >>>>>>>>> >>>>>>>>> I look forward to hearing from you, >>>>>>>>> >>>>>>>>> David >>>>>>>>> >>>>>>>>> >>>>>>>>> -- Jean-Baptiste Onofré >>>>>>>>> jbono...@apache.org <mailto:jbono...@apache.org> >>>>>>>>> http://blog.nanthrax.net >>>>>>>>> Talend - http://www.talend.com >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> s pozdravem >>>>>>>>> >>>>>>>>> David Morávek >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Jean-Baptiste Onofré >>>>>>>> jbono...@apache.org >>>>>>>> http://blog.nanthrax.net >>>>>>>> Talend - http://www.talend.com >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> -- >>> Jean-Baptiste Onofré >>> jbono...@apache.org >>> http://blog.nanthrax.net >>> Talend - http://www.talend.com >>> >> >> >> >> -- >> s pozdravem >> >> David Morávek >> >