Hi David,

absolutely !! Let's move forward on the preparation steps.

Are you on Slack and/or hangout to plan this ?

Thanks,
Regards
JB

On 01/02/2018 05:35 PM, David Morávek wrote:
Hello JB,

can we help in any way to move things forward?

Thanks,
D.

On Mon, Dec 18, 2017 at 4:28 PM, Jean-Baptiste Onofré <j...@nanthrax.net <mailto:j...@nanthrax.net>> wrote:

    Thanks Jan,

    It makes sense.

    Let me take a look on the code to understand the "interaction".

    Regards
    JB


    On 12/18/2017 04:26 PM, Jan Lukavský wrote:

        Hi JB,

        basically you are not wrong. The project started about three or four
        years ago with a goal to unify batch and streaming processing into
        single portable, executor independent API. Because of that, it is
        currently "close" to Beam in this sense. But we don't see much added
        value keeping this as a separate project, with one of the key
        differences to be the API (not the model itself), so we would like to
        focus on translation from Euphoria API to Beam's SDK. That's why we
        would like to see it as a DSL, so that it would be possible to use
        Euphoria API with Beam's runners as much natively as possible.

        I hope I didn't make the subject even more unclear, if so, I'll be happy
        to explain anything in more detail. :-)

            Jan


        On 12/18/2017 04:08 PM, Jean-Baptiste Onofré wrote:

            Hi Jan,

            Thanks for your answers.

            However, they confused me ;)

            Regarding what you replied, Euphoria seems like a programming
            model/SDK "close" to Beam more than a DSL on top of an existing Beam
            SDK.

            Am I wrong ?

            Regards
            JB

            On 12/18/2017 03:44 PM, Jan Lukavský wrote:

                Hi Ismael,

                basically we adopted the Beam's design regarding partitioning
                (https://github.com/seznam/euphoria/issues/160
                <https://github.com/seznam/euphoria/issues/160>) and implemented
                the sorting manually
                (https://github.com/seznam/euphoria/issues/158
                <https://github.com/seznam/euphoria/issues/158>). I'm not aware
                of the time model differences (Euphoria supports ingestion and
                event time, we don't support processing time by decision).
                Regarding other differences (looking into Beam capability
                matrix, I'd say that):

                   - we don't support stateful FlatMap (i.e. ParDo) for now
                (https://github.com/seznam/euphoria/issues/192
                <https://github.com/seznam/euphoria/issues/192>)

                   - we don't support side inputs (by decision now, but might be
                reconsidered) and outputs
                (https://github.com/seznam/euphoria/issues/124
                <https://github.com/seznam/euphoria/issues/124>)

                   - we support complete event-time windows (non-merging,
                merging, aligned, unaligned) and time control

                   - we don't support processing time by decision (might be
                reconsidered if a valid use-case is found)

                   - we support window triggering based on both time and data,
                including discarding and accumulating (without accumulating &
                retracting)

                All our executors (runners) - Flink, Spark and Local - implement
                the complete model, which we enforce using "operator test kit"
                that all executors must pass. Spark executor supports bounded
                sources only (for now). As David said, we currently don't have
                serialization abstraction, so there is some work to be done in
                that regard.

                Our intention is to completely supersede Euphoria, we would like
                to consider possibility to use executors that would not rely on
                Beam, but that is optional now and should be straightforward.

                We'd be happy to answer any more questions you might have and
                thanks a lot!

                Best,

                   Jan


                On 12/18/2017 03:19 PM, Ismaël Mejía wrote:

                    Hi,

                    It is great to see that you guys have achieved a maturity
                    point to
                    propose this. Congratulations for your work and the idea to
                    contribute
                    it into Beam.

                    I remember from a previous discussion with Jan about the 
model
                    mismatch between Euphoria and Beam, because of some design
                    decisions
                    of both projects. I remember you guys had some issues with
                    the way
                    Beam's sources do partitioning, as well as Beam's lack of
                    sorted data
                    (on shuffle a la hadoop). Also if I remember well the 'time'
                    model of
                    Euphoria was simpler than Beam's. I talk about all of this
                    because I
                    am curious about what parts of the Euphoria model you guys
                    had to
                    sacrifice to support Beam, and what parts of Beam's model
                    should still
                    be integrated into Euphoria (and if there is a
                    straightforward path to
                    do it).

                    If I understand well if this gets merged into Apache this
                    means that
                    Euphoria's current implementation would be superseded by
                    this DSL? I
                    am curious because I would like to understand your level of
                    investment
                    on supporting the future of this DSL.

                    Thanks and congrats again !
                    Ismaël

                    On Mon, Dec 18, 2017 at 10:12 AM, Jean-Baptiste Onofré
                    <j...@nanthrax.net <mailto:j...@nanthrax.net>> wrote:

                        Depending of the donation, you would need ICLA for each
                        contributor, and
                        CCLA in addition of SGA.

                        We can sync with Davor and I for the legal stuff.
                        However, I would wait a little bit just to have feedback
                        from the whole team
                        and start a formal vote.

                        I would be happy to start the formal vote.

                        Regards
                        JB

                        On 12/18/2017 10:03 AM, David Morávek wrote:

                            Hello,

                            Thanks for the awesome feedback!

                            Romain:

                            We already use Java Stream API in all operators
                            where it makes sense (eg.:
                            ReduceByKey). Still not sure if it was a good
                            choice, but i can be easily
                            converted to iterator anyway.

                            Side outputs support is coming soon, we already made
                            an initial work on
                            this.

                            Side inputs are not supported in a way you are used
                            to from beam, because
                            it can be replaced by Join operator on the same key
                            (if annotated with
                            broadcastHashJoin, it will be turned into map side
                            join).

                            Only significant difference from Beam is, that we
                            decided not to abstract
                            serialization, so we need to add support for Type
                            Hints, because of type
                            erasure.

                            Fluent API:

                            API is fluent within one operator. It is designed to
                            "lead the
                            programmer", which means, that he we'll be only
                            offered methods that makes
                            sense after the last method he used (eg.: in
                            ReduceByKey, we know that after
                            keyBy either reduceBy method should come). It is
                            implemented as a series of
                            builders.

                            Davor:

                            Thanks, I'll contact you, and will start the process
                            of having all the
                            necessary paperwork signed on our side, so we can
                            get things moving.












                            On Mon, Dec 18, 2017 at 7:46 AM, Romain Manni-Bucau
                            <rmannibu...@gmail.com 
<mailto:rmannibu...@gmail.com>
                            <mailto:rmannibu...@gmail.com
                            <mailto:rmannibu...@gmail.com>>> wrote:

                                  Hi guys

                                  A DSL would be very welcomed, in particular if
                            fluent.

                                  Open question: did you study to implement
                            Stream API (surely extending
                            it to
                                  have a BeamStream and a few more features like
                            sides etc)? Would be
                            very
                                  natural and integrable easily anywhere and
                            avoid a new API discovery.

                                  Hazelcast jet did it so I dont see why Beam
                            couldnt.

                                  Le 18 déc. 2017 07:26, "Davor Bonaci"
                            <da...@apache.org <mailto:da...@apache.org>
                                  <mailto:da...@apache.org
                            <mailto:da...@apache.org>>> a écrit :

                                      Hi David,
                                      As JB noted, merging of these two projects
                            is a great idea. If
                            fact,
                                      some of us have had those discussions in
                            the past.

                                      Legally, nothing particular is strictly
                            necessary as the code seem
                            to
                                      already be Apache 2.0 licensed. We don't,
                            however, want to be
                            perceived
                                      as making hostile forks, so it would be
                            great to file a Software
                            Grant
                                      Agreement with the ASF Secretary. I can
                            help with the process, as
                            necessary.

                                      Project alignment-wise, there aren't any
                            particular blockers that
                            I am
                                      aware of. We welcome DSLs.

                                      Technically, the code would start in a
                            feature branch. During this
                                      stage, we'd need to validate a few things,
                            including confirmation
                            the
                                      code and dependencies match the ASF
                            policy, automate testing in
                            Beam's
                                      tooling, etc. At that point, we'd take a
                            community vote to accept
                            the
                                      component into master, and consider
                            author(s) for committership in
                            the
                                      overall project.

                                      Welcome to the ASF and Beam -- we are
                            thrilled to have you! Hope
                            this
                                      helps, and please reach out if anybody on
                            our end can help,
                            including JB
                                      or myself.

                                      Davor


                                      On Sun, Dec 17, 2017 at 10:13 AM,
                            Jean-Baptiste Onofré
                            <j...@nanthrax.net <mailto:j...@nanthrax.net>
                                      <mailto:j...@nanthrax.net
                            <mailto:j...@nanthrax.net>>> wrote:

                                          Hi David,

                                          Generally speaking, having different
                            fluent DSL on top of the
                            Beam
                                          SDK is great.

                                          I would like to take a look on your
                            wordcount examples to give
                            you a
                                          complete feedback. I like the idea and
                            a fluent Java DSL is
                            valuable.

                                          Let's wait feedback from others. If we
                            have a consensus, then
                            I
                                          would be more than happy to help you
                            for the donation (I
                            worked on
                                          the Camel Java DSL while ago, so I
                            have some experience here).

                                          Thanks !
                                          Regards
                                          JB

                                          On 12/17/2017 07:00 PM, David Morávek
                            wrote:

                                              Hello,


                                              First of all, thanks for the
                            amazing work the Apache Beam
                                              community is doing!


                                              In 2014, we've started development
                            of the runtime
                            independent
                                              Java 8 API, that helps us to
                            create unified big-data
                            processing
                                              flows. It has been used as a core
                            building block of
                            Seznam.cz
                                              web crawler data infrastructure
                            every since. Its design
                                              principles and execution model are
                            very similar to Apache
                            Beam.


                                              This API was open sourced in 2016,
                            under the name Euphoria
                            API:

                            https://github.com/seznam/euphoria
                            <https://github.com/seznam/euphoria>
                            <https://github.com/seznam/euphoria
                            <https://github.com/seznam/euphoria>>


                                              As it is very similar to Apache
                            Beam, we feel, that it is
                            not
                                              worth of duplicating effort in
                            terms of development of new
                                              runtimes and fine-tuning of
                            current ones.


                                              The main blocker for us to switch
                            to Apache Beam is lack
                            of the
                                              Java 8 API. *W*e propose the
                            integration of Euphoria API
                            into
                                              Apache Beam as a Java 8 DSL, in
                            order to share our effort
                            with
                                              the community.


                                              Simple example of the Euphoria API
                            usage, can be found
                            here:


                            
https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount
                            
<https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount>


                            
<https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount
                            
<https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount>>



                                              If you feel, that Beam community
                            could leverage from our
                            work,
                                              we would love to start working on
                            Euphoria integration
                            into
                                              Apache Beam (we already have a
                            working POC, with few basic
                                              operators implemented).


                                              I look forward to hearing from 
you,

                                              David


                                          --             Jean-Baptiste Onofré
                            jbono...@apache.org <mailto:jbono...@apache.org>
                            <mailto:jbono...@apache.org
                            <mailto:jbono...@apache.org>>
                            http://blog.nanthrax.net
                                          Talend - http://www.talend.com





-- s pozdravem

                            David Morávek


-- Jean-Baptiste Onofré
                        jbono...@apache.org <mailto:jbono...@apache.org>
                        http://blog.nanthrax.net
                        Talend - http://www.talend.com





-- Jean-Baptiste Onofré
    jbono...@apache.org <mailto:jbono...@apache.org>
    http://blog.nanthrax.net
    Talend - http://www.talend.com




--
s pozdravem

David Morávek

--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Reply via email to