Hello JB,
can we help in any way to move things forward?
Thanks,
D.
On Mon, Dec 18, 2017 at 4:28 PM, Jean-Baptiste Onofré <j...@nanthrax.net
<mailto:j...@nanthrax.net>> wrote:
Thanks Jan,
It makes sense.
Let me take a look on the code to understand the "interaction".
Regards
JB
On 12/18/2017 04:26 PM, Jan Lukavský wrote:
Hi JB,
basically you are not wrong. The project started about three or four
years ago with a goal to unify batch and streaming processing into
single portable, executor independent API. Because of that, it is
currently "close" to Beam in this sense. But we don't see much added
value keeping this as a separate project, with one of the key
differences to be the API (not the model itself), so we would like to
focus on translation from Euphoria API to Beam's SDK. That's why we
would like to see it as a DSL, so that it would be possible to use
Euphoria API with Beam's runners as much natively as possible.
I hope I didn't make the subject even more unclear, if so, I'll be happy
to explain anything in more detail. :-)
Jan
On 12/18/2017 04:08 PM, Jean-Baptiste Onofré wrote:
Hi Jan,
Thanks for your answers.
However, they confused me ;)
Regarding what you replied, Euphoria seems like a programming
model/SDK "close" to Beam more than a DSL on top of an existing Beam
SDK.
Am I wrong ?
Regards
JB
On 12/18/2017 03:44 PM, Jan Lukavský wrote:
Hi Ismael,
basically we adopted the Beam's design regarding partitioning
(https://github.com/seznam/euphoria/issues/160
<https://github.com/seznam/euphoria/issues/160>) and implemented
the sorting manually
(https://github.com/seznam/euphoria/issues/158
<https://github.com/seznam/euphoria/issues/158>). I'm not aware
of the time model differences (Euphoria supports ingestion and
event time, we don't support processing time by decision).
Regarding other differences (looking into Beam capability
matrix, I'd say that):
- we don't support stateful FlatMap (i.e. ParDo) for now
(https://github.com/seznam/euphoria/issues/192
<https://github.com/seznam/euphoria/issues/192>)
- we don't support side inputs (by decision now, but might be
reconsidered) and outputs
(https://github.com/seznam/euphoria/issues/124
<https://github.com/seznam/euphoria/issues/124>)
- we support complete event-time windows (non-merging,
merging, aligned, unaligned) and time control
- we don't support processing time by decision (might be
reconsidered if a valid use-case is found)
- we support window triggering based on both time and data,
including discarding and accumulating (without accumulating &
retracting)
All our executors (runners) - Flink, Spark and Local - implement
the complete model, which we enforce using "operator test kit"
that all executors must pass. Spark executor supports bounded
sources only (for now). As David said, we currently don't have
serialization abstraction, so there is some work to be done in
that regard.
Our intention is to completely supersede Euphoria, we would like
to consider possibility to use executors that would not rely on
Beam, but that is optional now and should be straightforward.
We'd be happy to answer any more questions you might have and
thanks a lot!
Best,
Jan
On 12/18/2017 03:19 PM, Ismaël Mejía wrote:
Hi,
It is great to see that you guys have achieved a maturity
point to
propose this. Congratulations for your work and the idea to
contribute
it into Beam.
I remember from a previous discussion with Jan about the
model
mismatch between Euphoria and Beam, because of some design
decisions
of both projects. I remember you guys had some issues with
the way
Beam's sources do partitioning, as well as Beam's lack of
sorted data
(on shuffle a la hadoop). Also if I remember well the 'time'
model of
Euphoria was simpler than Beam's. I talk about all of this
because I
am curious about what parts of the Euphoria model you guys
had to
sacrifice to support Beam, and what parts of Beam's model
should still
be integrated into Euphoria (and if there is a
straightforward path to
do it).
If I understand well if this gets merged into Apache this
means that
Euphoria's current implementation would be superseded by
this DSL? I
am curious because I would like to understand your level of
investment
on supporting the future of this DSL.
Thanks and congrats again !
Ismaël
On Mon, Dec 18, 2017 at 10:12 AM, Jean-Baptiste Onofré
<j...@nanthrax.net <mailto:j...@nanthrax.net>> wrote:
Depending of the donation, you would need ICLA for each
contributor, and
CCLA in addition of SGA.
We can sync with Davor and I for the legal stuff.
However, I would wait a little bit just to have feedback
from the whole team
and start a formal vote.
I would be happy to start the formal vote.
Regards
JB
On 12/18/2017 10:03 AM, David Morávek wrote:
Hello,
Thanks for the awesome feedback!
Romain:
We already use Java Stream API in all operators
where it makes sense (eg.:
ReduceByKey). Still not sure if it was a good
choice, but i can be easily
converted to iterator anyway.
Side outputs support is coming soon, we already made
an initial work on
this.
Side inputs are not supported in a way you are used
to from beam, because
it can be replaced by Join operator on the same key
(if annotated with
broadcastHashJoin, it will be turned into map side
join).
Only significant difference from Beam is, that we
decided not to abstract
serialization, so we need to add support for Type
Hints, because of type
erasure.
Fluent API:
API is fluent within one operator. It is designed to
"lead the
programmer", which means, that he we'll be only
offered methods that makes
sense after the last method he used (eg.: in
ReduceByKey, we know that after
keyBy either reduceBy method should come). It is
implemented as a series of
builders.
Davor:
Thanks, I'll contact you, and will start the process
of having all the
necessary paperwork signed on our side, so we can
get things moving.
On Mon, Dec 18, 2017 at 7:46 AM, Romain Manni-Bucau
<rmannibu...@gmail.com
<mailto:rmannibu...@gmail.com>
<mailto:rmannibu...@gmail.com
<mailto:rmannibu...@gmail.com>>> wrote:
Hi guys
A DSL would be very welcomed, in particular if
fluent.
Open question: did you study to implement
Stream API (surely extending
it to
have a BeamStream and a few more features like
sides etc)? Would be
very
natural and integrable easily anywhere and
avoid a new API discovery.
Hazelcast jet did it so I dont see why Beam
couldnt.
Le 18 déc. 2017 07:26, "Davor Bonaci"
<da...@apache.org <mailto:da...@apache.org>
<mailto:da...@apache.org
<mailto:da...@apache.org>>> a écrit :
Hi David,
As JB noted, merging of these two projects
is a great idea. If
fact,
some of us have had those discussions in
the past.
Legally, nothing particular is strictly
necessary as the code seem
to
already be Apache 2.0 licensed. We don't,
however, want to be
perceived
as making hostile forks, so it would be
great to file a Software
Grant
Agreement with the ASF Secretary. I can
help with the process, as
necessary.
Project alignment-wise, there aren't any
particular blockers that
I am
aware of. We welcome DSLs.
Technically, the code would start in a
feature branch. During this
stage, we'd need to validate a few things,
including confirmation
the
code and dependencies match the ASF
policy, automate testing in
Beam's
tooling, etc. At that point, we'd take a
community vote to accept
the
component into master, and consider
author(s) for committership in
the
overall project.
Welcome to the ASF and Beam -- we are
thrilled to have you! Hope
this
helps, and please reach out if anybody on
our end can help,
including JB
or myself.
Davor
On Sun, Dec 17, 2017 at 10:13 AM,
Jean-Baptiste Onofré
<j...@nanthrax.net <mailto:j...@nanthrax.net>
<mailto:j...@nanthrax.net
<mailto:j...@nanthrax.net>>> wrote:
Hi David,
Generally speaking, having different
fluent DSL on top of the
Beam
SDK is great.
I would like to take a look on your
wordcount examples to give
you a
complete feedback. I like the idea and
a fluent Java DSL is
valuable.
Let's wait feedback from others. If we
have a consensus, then
I
would be more than happy to help you
for the donation (I
worked on
the Camel Java DSL while ago, so I
have some experience here).
Thanks !
Regards
JB
On 12/17/2017 07:00 PM, David Morávek
wrote:
Hello,
First of all, thanks for the
amazing work the Apache Beam
community is doing!
In 2014, we've started development
of the runtime
independent
Java 8 API, that helps us to
create unified big-data
processing
flows. It has been used as a core
building block of
Seznam.cz
web crawler data infrastructure
every since. Its design
principles and execution model are
very similar to Apache
Beam.
This API was open sourced in 2016,
under the name Euphoria
API:
https://github.com/seznam/euphoria
<https://github.com/seznam/euphoria>
<https://github.com/seznam/euphoria
<https://github.com/seznam/euphoria>>
As it is very similar to Apache
Beam, we feel, that it is
not
worth of duplicating effort in
terms of development of new
runtimes and fine-tuning of
current ones.
The main blocker for us to switch
to Apache Beam is lack
of the
Java 8 API. *W*e propose the
integration of Euphoria API
into
Apache Beam as a Java 8 DSL, in
order to share our effort
with
the community.
Simple example of the Euphoria API
usage, can be found
here:
https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount
<https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount>
<https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount
<https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount>>
If you feel, that Beam community
could leverage from our
work,
we would love to start working on
Euphoria integration
into
Apache Beam (we already have a
working POC, with few basic
operators implemented).
I look forward to hearing from
you,
David
-- Jean-Baptiste Onofré
jbono...@apache.org <mailto:jbono...@apache.org>
<mailto:jbono...@apache.org
<mailto:jbono...@apache.org>>
http://blog.nanthrax.net
Talend - http://www.talend.com
--
s pozdravem
David Morávek
--
Jean-Baptiste Onofré
jbono...@apache.org <mailto:jbono...@apache.org>
http://blog.nanthrax.net
Talend - http://www.talend.com
--
Jean-Baptiste Onofré
jbono...@apache.org <mailto:jbono...@apache.org>
http://blog.nanthrax.net
Talend - http://www.talend.com
--
s pozdravem
David Morávek