To add on that: Romain, if you are really excited about Graal as a project,
here are some constructive suggestions as to what you can do on a
reasonably short timeframe:
- Propose/prototype a design for writing UDFs in Beam SQL using Graal
- Go through the portability-related design documents, come up with a more
precise assessment of what parts are actually dependent on Docker's
container format and/or on Docker itself, and propose a plan for untangling
this dependency and opening the door to other mechanisms of cross-language
execution

On Sat, May 5, 2018 at 12:50 PM Eugene Kirpichov <kirpic...@google.com>
wrote:

> Graal is a very young project, currently nowhere near the level of
> maturity or completeness as to be sufficient for Beam to fully bet its
> portability vision on it:
> - Graal currently only claims to support Java and Javascript, with Ruby
> and R in the status of "some applications may run", Python support "just
> beginning", and Go lacking altogether.
> - Regarding existing production usage, the Graal FAQ says it is "a project
> with new innovative technology in its early stages."
>
> That said, as Graal matures, I think it would be reasonable to keep an eye
> on it as a potential future lightweight alternative to containers for
> pipelines where Graal's level of support is sufficient for this particular
> pipeline.
>
> Please also keep in mind that execution of user code is only a small part
> of the overall portability picture, and dependency on Docker is an even
> smaller part of that (there is only 1 mention of the word "Docker" in all
> of Beam's portability protos, and the mention is in an out-of-date TODO
> comment). I hope this addresses your concerns.
>
> On Sat, May 5, 2018 at 11:49 AM Romain Manni-Bucau <rmannibu...@gmail.com>
> wrote:
>
>> Agree
>>
>> The jvm is still mainstream for big data and it is trivial to have a
>> remote facade to support natives but no point to have it in runners, it is
>> some particular transforms or even dofn and sources only...
>>
>>
>> Le 5 mai 2018 19:03, "Andrew Pilloud" <apill...@google.com> a écrit :
>>
>>> Thanks for the examples earlier, I think Hazelcast is a great example
>>> of something portability might make more difficult. I'm not working on
>>> portability, but my understanding is that the data sent to the runner is a
>>> blob of code and the name of the container to run it in. A runner with a
>>> native language (java on Hazelcast for example) could run the code directly
>>> without the container if it is in a language it supports. So when Hazelcast
>>> sees a known java container specified, it just loads the java blob and runs
>>> it. When it sees another container it rejects the pipeline. You could use
>>> Graal in the Hazelcast runner to do this for a number of languages. I would
>>> expect that this could also be done in the direct runner, which similarly
>>> provides a native java environment, so portable Java pipelines can be
>>> tested without docker?
>>>
>>> For another way to frame this: if Beam was originally written in Go, we
>>> would be having a different discussion. A pipeline written entirely in java
>>> wouldn't be possible, so instead to enable Hazelcast, we would have to be
>>> able to run the java from portability without running the container.
>>>
>>> Andrew
>>>
>>> On Sat, May 5, 2018 at 1:48 AM Romain Manni-Bucau <rmannibu...@gmail.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> 2018-05-05 9:27 GMT+02:00 Ismaël Mejía <ieme...@gmail.com>:
>>>>
>>>>> Graal would not be a viable solution for the reasons Henning and Andrew
>>>>> mentioned, or put in other words, when users choose a programming
>>>>> language
>>>>> they don’t choose only a ‘friendly’ syntax or programming model, they
>>>>> choose also the ecosystem that comes with it, and the libraries that
>>>>> make
>>>>> their life easier. However isolating these user libraries/dependencies
>>>>> is a
>>>>> hard problem and so far the standard solution to this problem is to use
>>>>> operating systems containers via docker.
>>>>>
>>>>
>>>> Graal solves that Ismael. Same kind of experience than running npm libs
>>>> on nashorn but with a more unified API to run any language soft.
>>>>
>>>>
>>>>>
>>>>> The Beam vision from day zero is to run pipelines written in multiple
>>>>> languages in runners in multiple systems, and so far we are not doing
>>>>> this
>>>>> in particular in the Apache runners. The portability work is the
>>>>> cleanest
>>>>> way to achieve this vision given the constraints.
>>>>>
>>>>
>>>> Hmm, did I read it wrong and we don't have specific integration of the
>>>> portable API in runners? This is what is messing up the runners and
>>>> limiting beam adoption on existing runners.
>>>> Portable API is a feature buildable on top of runner, not in runners.
>>>> Same as a runner implementing the 5-6 primitives can run anything, the
>>>> portable API should just rely on that and not require more integration.
>>>> It doesn't prevent more deep integrations as for some higher level
>>>> primitives existing in runners but it is not the case today for runners so
>>>> shouldn't exist IMHO.
>>>>
>>>>
>>>>>
>>>>> I agree however that for the Java SDK to Java runner case this can
>>>>> represent additional pain, docker ideally should not be a requirement
>>>>> for
>>>>> Java users with the Direct runner and debugging a pipeline should be as
>>>>> easy as it is today. I think the Univerrsal Local Runner exists to
>>>>> cover
>>>>> the Portable case, but after looking at this JIRA I am not sure if
>>>>> unification is coming (and by consequence if docker would be
>>>>> mandatory).
>>>>> https://issues.apache.org/jira/browse/BEAM-4239
>>>>>
>>>>> I suppose for the distributed runners that they must implement the full
>>>>> Portability APIs to be considered Beam multi language compliant but
>>>>> they
>>>>> can prefer for performance reasons to translate without the portability
>>>>> APIs the Java to Java case.
>>>>>
>>>>
>>>>
>>>> This is my issue, language portability must NOT impact runners at all,
>>>> it is just a way to forward primitives to a runner.
>>>> See it as a layer rewriting the pipeline and submitting it. No need to
>>>> modify any runner.
>>>>
>>>>
>>>>> On Sat, May 5, 2018 at 9:11 AM Reuven Lax <re...@google.com> wrote:
>>>>>
>>>>> > A beam cluster with the spark runner would include a spark cluster,
>>>>> plus
>>>>> what's needed for portability, plus the beam sdk.
>>>>>
>>>>> > On Fri, May 4, 2018, 11:55 PM Romain Manni-Bucau <
>>>>> rmannibu...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>
>>>>>
>>>>> >> Le 5 mai 2018 08:43, "Reuven Lax" <re...@google.com> a écrit :
>>>>>
>>>>> >> I don't believe we enforce docker anywhere. In fact if someone
>>>>> wanted to
>>>>> run an all-windows beam cluster, they would probably not use docker for
>>>>> their runner (docker runs on Windows, but not efficiently).
>>>>>
>>>>>
>>>>>
>>>>> >> Or doesnt run sometimes - a colleague hit that yesterday :(.
>>>>>
>>>>> >> What is a "beam cluster" - opposed to a spark or foink cluster? How
>>>>> would it work on windows servers?
>>>>>
>>>>>
>>>>> >> On Fri, May 4, 2018, 11:19 PM Romain Manni-Bucau <
>>>>> rmannibu...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>
>>>>>
>>>>> >>> 2018-05-05 2:33 GMT+02:00 Andrew Pilloud <apill...@google.com>:
>>>>>
>>>>> >>>> What docker really buys is a package format and runtime
>>>>> environment
>>>>> that is language and operating system agnostic. The docker packaging
>>>>> and
>>>>> runtime format is the de facto standard for portable applications such
>>>>> as
>>>>> this, and there is a group trying to turn it into an actual standard.
>>>>>
>>>>> >>>> I would agree with you that dockerd has become bloated but there
>>>>> are
>>>>> projects that solve that. There is no longer lock-in to dockerd, there
>>>>> are
>>>>> package format compatible docker replacements that eliminate the
>>>>> performance issues and overhead associated with docker. CRI-O (
>>>>> https://github.com/kubernetes-incubator/cri-o) is a really cool RedHat
>>>>> project which is a minimalist replacement for docker. I was recently
>>>>> working at a startup where I migrated our "data mover" appliance from
>>>>> Docker to CRI-O. Our application was able to get direct access to the
>>>>> ethernet driver and block devices which enabled a huge performance
>>>>> boost
>>>>> but we were also able to run containers produced by docker without
>>>>> modification.
>>>>>
>>>>> >>>> You mention that docker is "detail of one runner+vendor
>>>>> corrupting all
>>>>> the project and adding complexity and work to everyone". It sounds
>>>>> like you
>>>>> have a specific example you'd like to share? Is there a runner that is
>>>>> unable to move to portability because of docker?
>>>>>
>>>>>
>>>>> >>> IBM one for instance, some custom ones like an hazelcast based one,
>>>>> etc... More generally any runner developped outside beam itself - even
>>>>> if
>>>>> we take a snapshot today, most of beam's ones have the same pitall.
>>>>>
>>>>> >>> Note: i never said docker was a bad techno or so. Let me try to
>>>>> clarify.
>>>>>
>>>>> >>> Main issue is that you enforce docker usage which is still trendy.
>>>>> It
>>>>> is like scla which was promishing to kill java, check what it does
>>>>> today...
>>>>> >>> It starts to be tooled but it is also very impacting on the
>>>>> deployment
>>>>> side and for a good number of beam users who deploy it outside the
>>>>> cloud it
>>>>> is an issue.
>>>>> >>> Keep in mind beam is embeddable by design, it is not a runner
>>>>> environment and with the docker choice it imposes some environment
>>>>> which is
>>>>> inconsistent with beam design itself and this is where this choice
>>>>> blocks.
>>>>>
>>>>>
>>>>>
>>>>> >>>> Andrew
>>>>>
>>>>> >>>> On Fri, May 4, 2018 at 4:32 PM Henning Rohde <hero...@google.com>
>>>>> wrote:
>>>>>
>>>>> >>>>> Romain,
>>>>>
>>>>> >>>>> Docker, unlike selinux, solves a great number of tangible
>>>>> problems
>>>>> for us with IMO a relatively small tax. It does not have to be the only
>>>>> way. Some of the concerns you bring up along with possibilities were
>>>>> also
>>>>> discussed here: https://s.apache.org/beam-fn-api-container-contract. I
>>>>> encourage you to take a look.
>>>>>
>>>>> >>>>> Thanks,
>>>>> >>>>>   Henning
>>>>>
>>>>>
>>>>> >>>>> On Fri, May 4, 2018 at 3:18 PM Romain Manni-Bucau <
>>>>> rmannibu...@gmail.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>> >>>>>> Le 4 mai 2018 21:31, "Henning Rohde" <hero...@google.com> a
>>>>> écrit :
>>>>>
>>>>> >>>>>> I disagree with the characterization of docker and the
>>>>> implications
>>>>> made towards portability. Graal looks like a neat project (and I never
>>>>> thought I would live to see the phrase "Practical Partial Evaluation"
>>>>> ..),
>>>>> but it doesn't address the needs of portability. In addition to Luke's
>>>>> examples, Go and most other languages don't work on it either. Docker
>>>>> containers also address packaging, OS dependencies, conflicting
>>>>> versions
>>>>> and distribution aspects in addition to truly universal language
>>>>> support.
>>>>>
>>>>>
>>>>> >>>>>> This is wrong, docker also has its conflicts, is not universal
>>>>> (fails on windows and mac easily - as host or not, cloud vendors put
>>>>> layers
>>>>> limiting or corrupting it, and it is an infra constraint imposed and a
>>>>> vendor locking not welcomed in beam IMHO).
>>>>>
>>>>> >>>>>> This is my main concern. All the work done looks like an
>>>>> implemzntation detail of one runner+vendor corrupting all the project
>>>>> and
>>>>> adding complexity and work to everyone instead of keeping it localised
>>>>> (technically it is possible).
>>>>>
>>>>> >>>>>> Would you accept i enforce you to use selinux? Using docker is
>>>>> the
>>>>> same kind of constraint.
>>>>>
>>>>>
>>>>> >>>>>> That said, it's entirely fine for some runners to use Jython,
>>>>> Graal,
>>>>> etc to provide a specialized offering similar to the direct runners,
>>>>> but it
>>>>> would be disjoint from portability IMO.
>>>>>
>>>>> >>>>>> On Fri, May 4, 2018 at 10:14 AM Romain Manni-Bucau <
>>>>> rmannibu...@gmail.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>> >>>>>>> Le 4 mai 2018 17:55, "Lukasz Cwik" <lc...@google.com> a écrit
>>>>> :
>>>>>
>>>>> >>>>>>> I did take a look at Graal a while back when thinking about how
>>>>> execution environments could be defined, my concerns were related to
>>>>> it not
>>>>> supporting all of the features of a language.
>>>>> >>>>>>> For example, its typical for Python to load and call native
>>>>> libraries and Graal can only execute C/C++ code that has been compiled
>>>>> to
>>>>> LLVM.
>>>>> >>>>>>> Also, a good amount of people interested in using ML libraries
>>>>> will
>>>>> want access to GPUs to improve performance which I believe that Graal
>>>>> can't
>>>>> support.
>>>>>
>>>>> >>>>>>> It can be a very useful way to run simple lamda functions
>>>>> written
>>>>> in some language directly without needing to use a docker environment
>>>>> but
>>>>> you could probably use something even lighter weight then Graal that is
>>>>> language specific like Jython.
>>>>>
>>>>>
>>>>>
>>>>> >>>>>>> Right, the jsr223 impl works very well but you can also have a
>>>>> perf
>>>>> boost using native (like v8 java binding for js for instance). It is
>>>>> way
>>>>> more efficient than docker most of the time and not code intrusive at
>>>>> all
>>>>> in runners so likely more adoption-able and maintainable. That said
>>>>> all is
>>>>> doable behind the jsr223 so maybe not a big deal in terms of api. We
>>>>> just
>>>>> need to ensure portability work stay clean and actually portable and
>>>>> doesnt
>>>>> impact runners as poc done until today did.
>>>>>
>>>>> >>>>>>> Works for me.
>>>>>
>>>>>
>>>>> >>>>>>> On Thu, May 3, 2018 at 10:05 PM Romain Manni-Bucau <
>>>>> rmannibu...@gmail.com> wrote:
>>>>>
>>>>> >>>>>>>> Hi guys
>>>>>
>>>>> >>>>>>>> Since some time there are efforts to have a language portable
>>>>> support in beam but I cant really find a case it "works" being based on
>>>>> docker except for some vendor specific infra.
>>>>>
>>>>> >>>>>>>> Current solution:
>>>>>
>>>>> >>>>>>>> 1. Is runner intrusive (which is bad for beam and prevents
>>>>> adoption of big data vendors)
>>>>> >>>>>>>> 2. Based on docker (which assumed a runtime environment and is
>>>>> very ops/infra intrusive and likely too $$ quite often for what it
>>>>> brings)
>>>>>
>>>>> >>>>>>>> Did anyone had a look to graal which seems a way to make the
>>>>> feature doable in a lighter manner and optimized compared to default
>>>>> jsr223
>>>>> impls?
>>>>>
>>>>
>>>>

Reply via email to