Great !

Thanks !
Regards
JB

On 01/03/2018 07:29 AM, David Morávek wrote:
Hello JB,

Perfect! I'm already on the Beam Slack workspace, I'll contact you once I get to the office.

Thanks!
D.

On Wed, Jan 3, 2018 at 6:19 AM, Jean-Baptiste Onofré <j...@nanthrax.net <mailto:j...@nanthrax.net>> wrote:

    Hi David,

    absolutely !! Let's move forward on the preparation steps.

    Are you on Slack and/or hangout to plan this ?

    Thanks,
    Regards
    JB

    On 01/02/2018 05:35 PM, David Morávek wrote:

        Hello JB,

        can we help in any way to move things forward?

        Thanks,
        D.

        On Mon, Dec 18, 2017 at 4:28 PM, Jean-Baptiste Onofré <j...@nanthrax.net
        <mailto:j...@nanthrax.net> <mailto:j...@nanthrax.net
        <mailto:j...@nanthrax.net>>> wrote:

             Thanks Jan,

             It makes sense.

             Let me take a look on the code to understand the "interaction".

             Regards
             JB


             On 12/18/2017 04:26 PM, Jan Lukavský wrote:

                 Hi JB,

                 basically you are not wrong. The project started about three or
        four
                 years ago with a goal to unify batch and streaming processing 
into
                 single portable, executor independent API. Because of that, it 
is
                 currently "close" to Beam in this sense. But we don't see much
        added
                 value keeping this as a separate project, with one of the key
                 differences to be the API (not the model itself), so we would
        like to
                 focus on translation from Euphoria API to Beam's SDK. That's 
why we
                 would like to see it as a DSL, so that it would be possible to 
use
                 Euphoria API with Beam's runners as much natively as possible.

                 I hope I didn't make the subject even more unclear, if so, I'll
        be happy
                 to explain anything in more detail. :-)

                     Jan


                 On 12/18/2017 04:08 PM, Jean-Baptiste Onofré wrote:

                     Hi Jan,

                     Thanks for your answers.

                     However, they confused me ;)

                     Regarding what you replied, Euphoria seems like a 
programming
                     model/SDK "close" to Beam more than a DSL on top of an
        existing Beam
                     SDK.

                     Am I wrong ?

                     Regards
                     JB

                     On 12/18/2017 03:44 PM, Jan Lukavský wrote:

                         Hi Ismael,

                         basically we adopted the Beam's design regarding
        partitioning
                         (https://github.com/seznam/euphoria/issues/160
        <https://github.com/seznam/euphoria/issues/160>
                         <https://github.com/seznam/euphoria/issues/160
        <https://github.com/seznam/euphoria/issues/160>>) and implemented
                         the sorting manually
                         (https://github.com/seznam/euphoria/issues/158
        <https://github.com/seznam/euphoria/issues/158>
                         <https://github.com/seznam/euphoria/issues/158
        <https://github.com/seznam/euphoria/issues/158>>). I'm not aware
                         of the time model differences (Euphoria supports
        ingestion and
                         event time, we don't support processing time by 
decision).
                         Regarding other differences (looking into Beam 
capability
                         matrix, I'd say that):

                            - we don't support stateful FlatMap (i.e. ParDo) 
for now
                         (https://github.com/seznam/euphoria/issues/192
        <https://github.com/seznam/euphoria/issues/192>
                         <https://github.com/seznam/euphoria/issues/192
        <https://github.com/seznam/euphoria/issues/192>>)

                            - we don't support side inputs (by decision now, but
        might be
                         reconsidered) and outputs
                         (https://github.com/seznam/euphoria/issues/124
        <https://github.com/seznam/euphoria/issues/124>
                         <https://github.com/seznam/euphoria/issues/124
        <https://github.com/seznam/euphoria/issues/124>>)


                            - we support complete event-time windows 
(non-merging,
                         merging, aligned, unaligned) and time control

                            - we don't support processing time by decision 
(might be
                         reconsidered if a valid use-case is found)

                            - we support window triggering based on both time
        and data,
                         including discarding and accumulating (without
        accumulating &
                         retracting)

                         All our executors (runners) - Flink, Spark and Local -
        implement
                         the complete model, which we enforce using "operator
        test kit"
                         that all executors must pass. Spark executor supports
        bounded
                         sources only (for now). As David said, we currently
        don't have
                         serialization abstraction, so there is some work to be
        done in
                         that regard.

                         Our intention is to completely supersede Euphoria, we
        would like
                         to consider possibility to use executors that would not
        rely on
                         Beam, but that is optional now and should be
        straightforward.

                         We'd be happy to answer any more questions you might
        have and
                         thanks a lot!

                         Best,

                            Jan


                         On 12/18/2017 03:19 PM, Ismaël Mejía wrote:

                             Hi,

                             It is great to see that you guys have achieved a
        maturity
                             point to
                             propose this. Congratulations for your work and the
        idea to
                             contribute
                             it into Beam.

                             I remember from a previous discussion with Jan
        about the model
                             mismatch between Euphoria and Beam, because of some
        design
                             decisions
                             of both projects. I remember you guys had some
        issues with
                             the way
                             Beam's sources do partitioning, as well as Beam's
        lack of
                             sorted data
                             (on shuffle a la hadoop). Also if I remember well
        the 'time'
                             model of
                             Euphoria was simpler than Beam's. I talk about all
        of this
                             because I
                             am curious about what parts of the Euphoria model
        you guys
                             had to
                             sacrifice to support Beam, and what parts of Beam's
        model
                             should still
                             be integrated into Euphoria (and if there is a
                             straightforward path to
                             do it).

                             If I understand well if this gets merged into
        Apache this
                             means that
                             Euphoria's current implementation would be
        superseded by
                             this DSL? I
                             am curious because I would like to understand your
        level of
                             investment
                             on supporting the future of this DSL.

                             Thanks and congrats again !
                             Ismaël

                             On Mon, Dec 18, 2017 at 10:12 AM, Jean-Baptiste 
Onofré
                             <j...@nanthrax.net <mailto:j...@nanthrax.net>
        <mailto:j...@nanthrax.net <mailto:j...@nanthrax.net>>> wrote:

                                 Depending of the donation, you would need ICLA
        for each
                                 contributor, and
                                 CCLA in addition of SGA.

                                 We can sync with Davor and I for the legal 
stuff.
                                 However, I would wait a little bit just to have
        feedback
                                 from the whole team
                                 and start a formal vote.

                                 I would be happy to start the formal vote.

                                 Regards
                                 JB

                                 On 12/18/2017 10:03 AM, David Morávek wrote:

                                     Hello,

                                     Thanks for the awesome feedback!

                                     Romain:

                                     We already use Java Stream API in all 
operators
                                     where it makes sense (eg.:
                                     ReduceByKey). Still not sure if it was a 
good
                                     choice, but i can be easily
                                     converted to iterator anyway.

                                     Side outputs support is coming soon, we
        already made
                                     an initial work on
                                     this.

                                     Side inputs are not supported in a way you
        are used
                                     to from beam, because
                                     it can be replaced by Join operator on the
        same key
                                     (if annotated with
                                     broadcastHashJoin, it will be turned into
        map side
                                     join).

                                     Only significant difference from Beam is,
        that we
                                     decided not to abstract
                                     serialization, so we need to add support
        for Type
                                     Hints, because of type
                                     erasure.

                                     Fluent API:

                                     API is fluent within one operator. It is
        designed to
                                     "lead the
                                     programmer", which means, that he we'll be 
only
                                     offered methods that makes
                                     sense after the last method he used (eg.: 
in
                                     ReduceByKey, we know that after
                                     keyBy either reduceBy method should come).
        It is
                                     implemented as a series of
                                     builders.

                                     Davor:

                                     Thanks, I'll contact you, and will start
        the process
                                     of having all the
                                     necessary paperwork signed on our side, so
        we can
                                     get things moving.












                                     On Mon, Dec 18, 2017 at 7:46 AM, Romain
        Manni-Bucau
                                     <rmannibu...@gmail.com
        <mailto:rmannibu...@gmail.com> <mailto:rmannibu...@gmail.com
        <mailto:rmannibu...@gmail.com>>
                                     <mailto:rmannibu...@gmail.com
        <mailto:rmannibu...@gmail.com>
                                     <mailto:rmannibu...@gmail.com
        <mailto:rmannibu...@gmail.com>>>> wrote:

                                           Hi guys

                                           A DSL would be very welcomed, in
        particular if
                                     fluent.

                                           Open question: did you study to 
implement
                                     Stream API (surely extending
                                     it to
                                           have a BeamStream and a few more
        features like
                                     sides etc)? Would be
                                     very
                                           natural and integrable easily
        anywhere and
                                     avoid a new API discovery.

                                           Hazelcast jet did it so I dont see
        why Beam
                                     couldnt.

                                           Le 18 déc. 2017 07:26, "Davor Bonaci"
                                     <da...@apache.org <mailto:da...@apache.org>
        <mailto:da...@apache.org <mailto:da...@apache.org>>
                                           <mailto:da...@apache.org
        <mailto:da...@apache.org>

                                     <mailto:da...@apache.org
        <mailto:da...@apache.org>>>> a écrit :

                                               Hi David,
                                               As JB noted, merging of these two
        projects
                                     is a great idea. If
                                     fact,
                                               some of us have had those
        discussions in
                                     the past.

                                               Legally, nothing particular is
        strictly
                                     necessary as the code seem
                                     to
                                               already be Apache 2.0 licensed.
        We don't,
                                     however, want to be
                                     perceived
                                               as making hostile forks, so it
        would be
                                     great to file a Software
                                     Grant
                                               Agreement with the ASF Secretary.
        I can
                                     help with the process, as
                                     necessary.

                                               Project alignment-wise, there
        aren't any
                                     particular blockers that
                                     I am
                                               aware of. We welcome DSLs.

                                               Technically, the code would start
        in a
                                     feature branch. During this
                                               stage, we'd need to validate a
        few things,
                                     including confirmation
                                     the
                                               code and dependencies match the 
ASF
                                     policy, automate testing in
                                     Beam's
                                               tooling, etc. At that point, we'd
        take a
                                     community vote to accept
                                     the
                                               component into master, and 
consider
                                     author(s) for committership in
                                     the
                                               overall project.

                                               Welcome to the ASF and Beam -- 
we are
                                     thrilled to have you! Hope
                                     this
                                               helps, and please reach out if
        anybody on
                                     our end can help,
                                     including JB
                                               or myself.

                                               Davor


                                               On Sun, Dec 17, 2017 at 10:13 AM,
                                     Jean-Baptiste Onofré
                                     <j...@nanthrax.net 
<mailto:j...@nanthrax.net>
        <mailto:j...@nanthrax.net <mailto:j...@nanthrax.net>>
                                               <mailto:j...@nanthrax.net
        <mailto:j...@nanthrax.net>

                                     <mailto:j...@nanthrax.net
        <mailto:j...@nanthrax.net>>>> wrote:

                                                   Hi David,

                                                   Generally speaking, having
        different
                                     fluent DSL on top of the
                                     Beam
                                                   SDK is great.

                                                   I would like to take a look
        on your
                                     wordcount examples to give
                                     you a
                                                   complete feedback. I like the
        idea and
                                     a fluent Java DSL is
                                     valuable.

                                                   Let's wait feedback from
        others. If we
                                     have a consensus, then
                                     I
                                                   would be more than happy to
        help you
                                     for the donation (I
                                     worked on
                                                   the Camel Java DSL while ago,
        so I
                                     have some experience here).

                                                   Thanks !
                                                   Regards
                                                   JB

                                                   On 12/17/2017 07:00 PM, David
        Morávek
                                     wrote:

                                                       Hello,


                                                       First of all, thanks for 
the
                                     amazing work the Apache Beam
                                                       community is doing!


                                                       In 2014, we've started
        development
                                     of the runtime
                                     independent
                                                       Java 8 API, that helps 
us to
                                     create unified big-data
                                     processing
                                                       flows. It has been used
        as a core
                                     building block of
                                     Seznam.cz
                                                       web crawler data
        infrastructure
                                     every since. Its design
                                                       principles and execution
        model are
                                     very similar to Apache
                                     Beam.


                                                       This API was open sourced
        in 2016,
                                     under the name Euphoria
                                     API:

        https://github.com/seznam/euphoria <https://github.com/seznam/euphoria>
                                     <https://github.com/seznam/euphoria
        <https://github.com/seznam/euphoria>>
                                     <https://github.com/seznam/euphoria
        <https://github.com/seznam/euphoria>
                                     <https://github.com/seznam/euphoria
        <https://github.com/seznam/euphoria>>>


                                                       As it is very similar to
        Apache
                                     Beam, we feel, that it is
                                     not
                                                       worth of duplicating
        effort in
                                     terms of development of new
                                                       runtimes and fine-tuning 
of
                                     current ones.


                                                       The main blocker for us
        to switch
                                     to Apache Beam is lack
                                     of the
                                                       Java 8 API. *W*e propose 
the
                                     integration of Euphoria API
                                     into
                                                       Apache Beam as a Java 8
        DSL, in
                                     order to share our effort
                                     with
                                                       the community.


                                                       Simple example of the
        Euphoria API
                                     usage, can be found
                                     here:


        
https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount
        
<https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount>
<https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount
        
<https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount>>


<https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount
        
<https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount>
<https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount
        
<https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount>>>



                                                       If you feel, that Beam
        community
                                     could leverage from our
                                     work,
                                                       we would love to start
        working on
                                     Euphoria integration
                                     into
                                                       Apache Beam (we already
        have a
                                     working POC, with few basic
                                                       operators implemented).


                                                       I look forward to hearing
        from you,

                                                       David


                                                   --             Jean-Baptiste
        Onofré
        jbono...@apache.org <mailto:jbono...@apache.org>
        <mailto:jbono...@apache.org <mailto:jbono...@apache.org>>
                                     <mailto:jbono...@apache.org
        <mailto:jbono...@apache.org>
                                     <mailto:jbono...@apache.org
        <mailto:jbono...@apache.org>>>
        http://blog.nanthrax.net
                                                   Talend - 
http://www.talend.com





                                     --                             s pozdravem

                                     David Morávek


                                 --                         Jean-Baptiste Onofré
        jbono...@apache.org <mailto:jbono...@apache.org>
        <mailto:jbono...@apache.org <mailto:jbono...@apache.org>>
        http://blog.nanthrax.net
                                 Talend - http://www.talend.com





             --     Jean-Baptiste Onofré
        jbono...@apache.org <mailto:jbono...@apache.org>
        <mailto:jbono...@apache.org <mailto:jbono...@apache.org>>
        http://blog.nanthrax.net
             Talend - http://www.talend.com




-- s pozdravem

        David Morávek


-- Jean-Baptiste Onofré
    jbono...@apache.org <mailto:jbono...@apache.org>
    http://blog.nanthrax.net
    Talend - http://www.talend.com



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Reply via email to