Hello JB,
Perfect! I'm already on the Beam Slack workspace, I'll contact you once I get to
the office.
Thanks!
D.
On Wed, Jan 3, 2018 at 6:19 AM, Jean-Baptiste Onofré <j...@nanthrax.net
<mailto:j...@nanthrax.net>> wrote:
Hi David,
absolutely !! Let's move forward on the preparation steps.
Are you on Slack and/or hangout to plan this ?
Thanks,
Regards
JB
On 01/02/2018 05:35 PM, David Morávek wrote:
Hello JB,
can we help in any way to move things forward?
Thanks,
D.
On Mon, Dec 18, 2017 at 4:28 PM, Jean-Baptiste Onofré <j...@nanthrax.net
<mailto:j...@nanthrax.net> <mailto:j...@nanthrax.net
<mailto:j...@nanthrax.net>>> wrote:
Thanks Jan,
It makes sense.
Let me take a look on the code to understand the "interaction".
Regards
JB
On 12/18/2017 04:26 PM, Jan Lukavský wrote:
Hi JB,
basically you are not wrong. The project started about three or
four
years ago with a goal to unify batch and streaming processing
into
single portable, executor independent API. Because of that, it
is
currently "close" to Beam in this sense. But we don't see much
added
value keeping this as a separate project, with one of the key
differences to be the API (not the model itself), so we would
like to
focus on translation from Euphoria API to Beam's SDK. That's
why we
would like to see it as a DSL, so that it would be possible to
use
Euphoria API with Beam's runners as much natively as possible.
I hope I didn't make the subject even more unclear, if so, I'll
be happy
to explain anything in more detail. :-)
Jan
On 12/18/2017 04:08 PM, Jean-Baptiste Onofré wrote:
Hi Jan,
Thanks for your answers.
However, they confused me ;)
Regarding what you replied, Euphoria seems like a
programming
model/SDK "close" to Beam more than a DSL on top of an
existing Beam
SDK.
Am I wrong ?
Regards
JB
On 12/18/2017 03:44 PM, Jan Lukavský wrote:
Hi Ismael,
basically we adopted the Beam's design regarding
partitioning
(https://github.com/seznam/euphoria/issues/160
<https://github.com/seznam/euphoria/issues/160>
<https://github.com/seznam/euphoria/issues/160
<https://github.com/seznam/euphoria/issues/160>>) and implemented
the sorting manually
(https://github.com/seznam/euphoria/issues/158
<https://github.com/seznam/euphoria/issues/158>
<https://github.com/seznam/euphoria/issues/158
<https://github.com/seznam/euphoria/issues/158>>). I'm not aware
of the time model differences (Euphoria supports
ingestion and
event time, we don't support processing time by
decision).
Regarding other differences (looking into Beam
capability
matrix, I'd say that):
- we don't support stateful FlatMap (i.e. ParDo)
for now
(https://github.com/seznam/euphoria/issues/192
<https://github.com/seznam/euphoria/issues/192>
<https://github.com/seznam/euphoria/issues/192
<https://github.com/seznam/euphoria/issues/192>>)
- we don't support side inputs (by decision now, but
might be
reconsidered) and outputs
(https://github.com/seznam/euphoria/issues/124
<https://github.com/seznam/euphoria/issues/124>
<https://github.com/seznam/euphoria/issues/124
<https://github.com/seznam/euphoria/issues/124>>)
- we support complete event-time windows
(non-merging,
merging, aligned, unaligned) and time control
- we don't support processing time by decision
(might be
reconsidered if a valid use-case is found)
- we support window triggering based on both time
and data,
including discarding and accumulating (without
accumulating &
retracting)
All our executors (runners) - Flink, Spark and Local -
implement
the complete model, which we enforce using "operator
test kit"
that all executors must pass. Spark executor supports
bounded
sources only (for now). As David said, we currently
don't have
serialization abstraction, so there is some work to be
done in
that regard.
Our intention is to completely supersede Euphoria, we
would like
to consider possibility to use executors that would not
rely on
Beam, but that is optional now and should be
straightforward.
We'd be happy to answer any more questions you might
have and
thanks a lot!
Best,
Jan
On 12/18/2017 03:19 PM, Ismaël Mejía wrote:
Hi,
It is great to see that you guys have achieved a
maturity
point to
propose this. Congratulations for your work and the
idea to
contribute
it into Beam.
I remember from a previous discussion with Jan
about the model
mismatch between Euphoria and Beam, because of some
design
decisions
of both projects. I remember you guys had some
issues with
the way
Beam's sources do partitioning, as well as Beam's
lack of
sorted data
(on shuffle a la hadoop). Also if I remember well
the 'time'
model of
Euphoria was simpler than Beam's. I talk about all
of this
because I
am curious about what parts of the Euphoria model
you guys
had to
sacrifice to support Beam, and what parts of Beam's
model
should still
be integrated into Euphoria (and if there is a
straightforward path to
do it).
If I understand well if this gets merged into
Apache this
means that
Euphoria's current implementation would be
superseded by
this DSL? I
am curious because I would like to understand your
level of
investment
on supporting the future of this DSL.
Thanks and congrats again !
Ismaël
On Mon, Dec 18, 2017 at 10:12 AM, Jean-Baptiste
Onofré
<j...@nanthrax.net <mailto:j...@nanthrax.net>
<mailto:j...@nanthrax.net <mailto:j...@nanthrax.net>>> wrote:
Depending of the donation, you would need ICLA
for each
contributor, and
CCLA in addition of SGA.
We can sync with Davor and I for the legal
stuff.
However, I would wait a little bit just to have
feedback
from the whole team
and start a formal vote.
I would be happy to start the formal vote.
Regards
JB
On 12/18/2017 10:03 AM, David Morávek wrote:
Hello,
Thanks for the awesome feedback!
Romain:
We already use Java Stream API in all
operators
where it makes sense (eg.:
ReduceByKey). Still not sure if it was a
good
choice, but i can be easily
converted to iterator anyway.
Side outputs support is coming soon, we
already made
an initial work on
this.
Side inputs are not supported in a way you
are used
to from beam, because
it can be replaced by Join operator on the
same key
(if annotated with
broadcastHashJoin, it will be turned into
map side
join).
Only significant difference from Beam is,
that we
decided not to abstract
serialization, so we need to add support
for Type
Hints, because of type
erasure.
Fluent API:
API is fluent within one operator. It is
designed to
"lead the
programmer", which means, that he we'll be
only
offered methods that makes
sense after the last method he used (eg.:
in
ReduceByKey, we know that after
keyBy either reduceBy method should come).
It is
implemented as a series of
builders.
Davor:
Thanks, I'll contact you, and will start
the process
of having all the
necessary paperwork signed on our side, so
we can
get things moving.
On Mon, Dec 18, 2017 at 7:46 AM, Romain
Manni-Bucau
<rmannibu...@gmail.com
<mailto:rmannibu...@gmail.com> <mailto:rmannibu...@gmail.com
<mailto:rmannibu...@gmail.com>>
<mailto:rmannibu...@gmail.com
<mailto:rmannibu...@gmail.com>
<mailto:rmannibu...@gmail.com
<mailto:rmannibu...@gmail.com>>>> wrote:
Hi guys
A DSL would be very welcomed, in
particular if
fluent.
Open question: did you study to
implement
Stream API (surely extending
it to
have a BeamStream and a few more
features like
sides etc)? Would be
very
natural and integrable easily
anywhere and
avoid a new API discovery.
Hazelcast jet did it so I dont see
why Beam
couldnt.
Le 18 déc. 2017 07:26, "Davor Bonaci"
<da...@apache.org <mailto:da...@apache.org>
<mailto:da...@apache.org <mailto:da...@apache.org>>
<mailto:da...@apache.org
<mailto:da...@apache.org>
<mailto:da...@apache.org
<mailto:da...@apache.org>>>> a écrit :
Hi David,
As JB noted, merging of these two
projects
is a great idea. If
fact,
some of us have had those
discussions in
the past.
Legally, nothing particular is
strictly
necessary as the code seem
to
already be Apache 2.0 licensed.
We don't,
however, want to be
perceived
as making hostile forks, so it
would be
great to file a Software
Grant
Agreement with the ASF Secretary.
I can
help with the process, as
necessary.
Project alignment-wise, there
aren't any
particular blockers that
I am
aware of. We welcome DSLs.
Technically, the code would start
in a
feature branch. During this
stage, we'd need to validate a
few things,
including confirmation
the
code and dependencies match the
ASF
policy, automate testing in
Beam's
tooling, etc. At that point, we'd
take a
community vote to accept
the
component into master, and
consider
author(s) for committership in
the
overall project.
Welcome to the ASF and Beam --
we are
thrilled to have you! Hope
this
helps, and please reach out if
anybody on
our end can help,
including JB
or myself.
Davor
On Sun, Dec 17, 2017 at 10:13 AM,
Jean-Baptiste Onofré
<j...@nanthrax.net
<mailto:j...@nanthrax.net>
<mailto:j...@nanthrax.net <mailto:j...@nanthrax.net>>
<mailto:j...@nanthrax.net
<mailto:j...@nanthrax.net>
<mailto:j...@nanthrax.net
<mailto:j...@nanthrax.net>>>> wrote:
Hi David,
Generally speaking, having
different
fluent DSL on top of the
Beam
SDK is great.
I would like to take a look
on your
wordcount examples to give
you a
complete feedback. I like the
idea and
a fluent Java DSL is
valuable.
Let's wait feedback from
others. If we
have a consensus, then
I
would be more than happy to
help you
for the donation (I
worked on
the Camel Java DSL while ago,
so I
have some experience here).
Thanks !
Regards
JB
On 12/17/2017 07:00 PM, David
Morávek
wrote:
Hello,
First of all, thanks for
the
amazing work the Apache Beam
community is doing!
In 2014, we've started
development
of the runtime
independent
Java 8 API, that helps
us to
create unified big-data
processing
flows. It has been used
as a core
building block of
Seznam.cz
web crawler data
infrastructure
every since. Its design
principles and execution
model are
very similar to Apache
Beam.
This API was open sourced
in 2016,
under the name Euphoria
API:
https://github.com/seznam/euphoria <https://github.com/seznam/euphoria>
<https://github.com/seznam/euphoria
<https://github.com/seznam/euphoria>>
<https://github.com/seznam/euphoria
<https://github.com/seznam/euphoria>
<https://github.com/seznam/euphoria
<https://github.com/seznam/euphoria>>>
As it is very similar to
Apache
Beam, we feel, that it is
not
worth of duplicating
effort in
terms of development of new
runtimes and fine-tuning
of
current ones.
The main blocker for us
to switch
to Apache Beam is lack
of the
Java 8 API. *W*e propose
the
integration of Euphoria API
into
Apache Beam as a Java 8
DSL, in
order to share our effort
with
the community.
Simple example of the
Euphoria API
usage, can be found
here:
https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount
<https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount>
<https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount
<https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount>>
<https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount
<https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount>
<https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount
<https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount>>>
If you feel, that Beam
community
could leverage from our
work,
we would love to start
working on
Euphoria integration
into
Apache Beam (we already
have a
working POC, with few basic
operators implemented).
I look forward to hearing
from you,
David
-- Jean-Baptiste
Onofré
jbono...@apache.org <mailto:jbono...@apache.org>
<mailto:jbono...@apache.org <mailto:jbono...@apache.org>>
<mailto:jbono...@apache.org
<mailto:jbono...@apache.org>
<mailto:jbono...@apache.org
<mailto:jbono...@apache.org>>>
http://blog.nanthrax.net
Talend -
http://www.talend.com
-- s pozdravem
David Morávek
-- Jean-Baptiste Onofré
jbono...@apache.org <mailto:jbono...@apache.org>
<mailto:jbono...@apache.org <mailto:jbono...@apache.org>>
http://blog.nanthrax.net
Talend - http://www.talend.com
-- Jean-Baptiste Onofré
jbono...@apache.org <mailto:jbono...@apache.org>
<mailto:jbono...@apache.org <mailto:jbono...@apache.org>>
http://blog.nanthrax.net
Talend - http://www.talend.com
--
s pozdravem
David Morávek
--
Jean-Baptiste Onofré
jbono...@apache.org <mailto:jbono...@apache.org>
http://blog.nanthrax.net
Talend - http://www.talend.com