Graal would not be a viable solution for the reasons Henning and Andrew
mentioned, or put in other words, when users choose a programming language
they don’t choose only a ‘friendly’ syntax or programming model, they
choose also the ecosystem that comes with it, and the libraries that make
their life easier. However isolating these user libraries/dependencies is a
hard problem and so far the standard solution to this problem is to use
operating systems containers via docker.

The Beam vision from day zero is to run pipelines written in multiple
languages in runners in multiple systems, and so far we are not doing this
in particular in the Apache runners. The portability work is the cleanest
way to achieve this vision given the constraints.

I agree however that for the Java SDK to Java runner case this can
represent additional pain, docker ideally should not be a requirement for
Java users with the Direct runner and debugging a pipeline should be as
easy as it is today. I think the Univerrsal Local Runner exists to cover
the Portable case, but after looking at this JIRA I am not sure if
unification is coming (and by consequence if docker would be mandatory).
https://issues.apache.org/jira/browse/BEAM-4239

I suppose for the distributed runners that they must implement the full
Portability APIs to be considered Beam multi language compliant but they
can prefer for performance reasons to translate without the portability
APIs the Java to Java case.
On Sat, May 5, 2018 at 9:11 AM Reuven Lax <re...@google.com> wrote:

> A beam cluster with the spark runner would include a spark cluster, plus
what's needed for portability, plus the beam sdk.

> On Fri, May 4, 2018, 11:55 PM Romain Manni-Bucau <rmannibu...@gmail.com>
wrote:



>> Le 5 mai 2018 08:43, "Reuven Lax" <re...@google.com> a écrit :

>> I don't believe we enforce docker anywhere. In fact if someone wanted to
run an all-windows beam cluster, they would probably not use docker for
their runner (docker runs on Windows, but not efficiently).



>> Or doesnt run sometimes - a colleague hit that yesterday :(.

>> What is a "beam cluster" - opposed to a spark or foink cluster? How
would it work on windows servers?


>> On Fri, May 4, 2018, 11:19 PM Romain Manni-Bucau <rmannibu...@gmail.com>
wrote:



>>> 2018-05-05 2:33 GMT+02:00 Andrew Pilloud <apill...@google.com>:

>>>> What docker really buys is a package format and runtime environment
that is language and operating system agnostic. The docker packaging and
runtime format is the de facto standard for portable applications such as
this, and there is a group trying to turn it into an actual standard.

>>>> I would agree with you that dockerd has become bloated but there are
projects that solve that. There is no longer lock-in to dockerd, there are
package format compatible docker replacements that eliminate the
performance issues and overhead associated with docker. CRI-O (
https://github.com/kubernetes-incubator/cri-o) is a really cool RedHat
project which is a minimalist replacement for docker. I was recently
working at a startup where I migrated our "data mover" appliance from
Docker to CRI-O. Our application was able to get direct access to the
ethernet driver and block devices which enabled a huge performance boost
but we were also able to run containers produced by docker without
modification.

>>>> You mention that docker is "detail of one runner+vendor corrupting all
the project and adding complexity and work to everyone". It sounds like you
have a specific example you'd like to share? Is there a runner that is
unable to move to portability because of docker?


>>> IBM one for instance, some custom ones like an hazelcast based one,
etc... More generally any runner developped outside beam itself - even if
we take a snapshot today, most of beam's ones have the same pitall.

>>> Note: i never said docker was a bad techno or so. Let me try to clarify.

>>> Main issue is that you enforce docker usage which is still trendy. It
is like scla which was promishing to kill java, check what it does today...
>>> It starts to be tooled but it is also very impacting on the deployment
side and for a good number of beam users who deploy it outside the cloud it
is an issue.
>>> Keep in mind beam is embeddable by design, it is not a runner
environment and with the docker choice it imposes some environment which is
inconsistent with beam design itself and this is where this choice blocks.



>>>> Andrew

>>>> On Fri, May 4, 2018 at 4:32 PM Henning Rohde <hero...@google.com>
wrote:

>>>>> Romain,

>>>>> Docker, unlike selinux, solves a great number of tangible problems
for us with IMO a relatively small tax. It does not have to be the only
way. Some of the concerns you bring up along with possibilities were also
discussed here: https://s.apache.org/beam-fn-api-container-contract. I
encourage you to take a look.

>>>>> Thanks,
>>>>>   Henning


>>>>> On Fri, May 4, 2018 at 3:18 PM Romain Manni-Bucau <
rmannibu...@gmail.com> wrote:



>>>>>> Le 4 mai 2018 21:31, "Henning Rohde" <hero...@google.com> a écrit :

>>>>>> I disagree with the characterization of docker and the implications
made towards portability. Graal looks like a neat project (and I never
thought I would live to see the phrase "Practical Partial Evaluation" ..),
but it doesn't address the needs of portability. In addition to Luke's
examples, Go and most other languages don't work on it either. Docker
containers also address packaging, OS dependencies, conflicting versions
and distribution aspects in addition to truly universal language support.


>>>>>> This is wrong, docker also has its conflicts, is not universal
(fails on windows and mac easily - as host or not, cloud vendors put layers
limiting or corrupting it, and it is an infra constraint imposed and a
vendor locking not welcomed in beam IMHO).

>>>>>> This is my main concern. All the work done looks like an
implemzntation detail of one runner+vendor corrupting all the project and
adding complexity and work to everyone instead of keeping it localised
(technically it is possible).

>>>>>> Would you accept i enforce you to use selinux? Using docker is the
same kind of constraint.


>>>>>> That said, it's entirely fine for some runners to use Jython, Graal,
etc to provide a specialized offering similar to the direct runners, but it
would be disjoint from portability IMO.

>>>>>> On Fri, May 4, 2018 at 10:14 AM Romain Manni-Bucau <
rmannibu...@gmail.com> wrote:



>>>>>>> Le 4 mai 2018 17:55, "Lukasz Cwik" <lc...@google.com> a écrit :

>>>>>>> I did take a look at Graal a while back when thinking about how
execution environments could be defined, my concerns were related to it not
supporting all of the features of a language.
>>>>>>> For example, its typical for Python to load and call native
libraries and Graal can only execute C/C++ code that has been compiled to
LLVM.
>>>>>>> Also, a good amount of people interested in using ML libraries will
want access to GPUs to improve performance which I believe that Graal can't
support.

>>>>>>> It can be a very useful way to run simple lamda functions written
in some language directly without needing to use a docker environment but
you could probably use something even lighter weight then Graal that is
language specific like Jython.



>>>>>>> Right, the jsr223 impl works very well but you can also have a perf
boost using native (like v8 java binding for js for instance). It is way
more efficient than docker most of the time and not code intrusive at all
in runners so likely more adoption-able and maintainable. That said all is
doable behind the jsr223 so maybe not a big deal in terms of api. We just
need to ensure portability work stay clean and actually portable and doesnt
impact runners as poc done until today did.

>>>>>>> Works for me.


>>>>>>> On Thu, May 3, 2018 at 10:05 PM Romain Manni-Bucau <
rmannibu...@gmail.com> wrote:

>>>>>>>> Hi guys

>>>>>>>> Since some time there are efforts to have a language portable
support in beam but I cant really find a case it "works" being based on
docker except for some vendor specific infra.

>>>>>>>> Current solution:

>>>>>>>> 1. Is runner intrusive (which is bad for beam and prevents
adoption of big data vendors)
>>>>>>>> 2. Based on docker (which assumed a runtime environment and is
very ops/infra intrusive and likely too $$ quite often for what it brings)

>>>>>>>> Did anyone had a look to graal which seems a way to make the
feature doable in a lighter manner and optimized compared to default jsr223
impls?

Reply via email to