Re: Graal instead of docker?

Romain Manni-Bucau Sat, 05 May 2018 11:49:32 -0700

Agree

The jvm is still mainstream for big data and it is trivial to have a remote
facade to support natives but no point to have it in runners, it is some
particular transforms or even dofn and sources only...



Le 5 mai 2018 19:03, "Andrew Pilloud" <apill...@google.com> a écrit :

> Thanks for the examples earlier, I think Hazelcast is a great example of
> something portability might make more difficult. I'm not working on
> portability, but my understanding is that the data sent to the runner is a
> blob of code and the name of the container to run it in. A runner with a
> native language (java on Hazelcast for example) could run the code directly
> without the container if it is in a language it supports. So when Hazelcast
> sees a known java container specified, it just loads the java blob and runs
> it. When it sees another container it rejects the pipeline. You could use
> Graal in the Hazelcast runner to do this for a number of languages. I would
> expect that this could also be done in the direct runner, which similarly
> provides a native java environment, so portable Java pipelines can be
> tested without docker?
>
> For another way to frame this: if Beam was originally written in Go, we
> would be having a different discussion. A pipeline written entirely in java
> wouldn't be possible, so instead to enable Hazelcast, we would have to be
> able to run the java from portability without running the container.
>
> Andrew
>
> On Sat, May 5, 2018 at 1:48 AM Romain Manni-Bucau <rmannibu...@gmail.com>
> wrote:
>
>>
>>
>> 2018-05-05 9:27 GMT+02:00 Ismaël Mejía <ieme...@gmail.com>:
>>
>>> Graal would not be a viable solution for the reasons Henning and Andrew
>>> mentioned, or put in other words, when users choose a programming
>>> language
>>> they don’t choose only a ‘friendly’ syntax or programming model, they
>>> choose also the ecosystem that comes with it, and the libraries that make
>>> their life easier. However isolating these user libraries/dependencies
>>> is a
>>> hard problem and so far the standard solution to this problem is to use
>>> operating systems containers via docker.
>>>
>>
>> Graal solves that Ismael. Same kind of experience than running npm libs
>> on nashorn but with a more unified API to run any language soft.
>>
>>
>>>
>>> The Beam vision from day zero is to run pipelines written in multiple
>>> languages in runners in multiple systems, and so far we are not doing
>>> this
>>> in particular in the Apache runners. The portability work is the cleanest
>>> way to achieve this vision given the constraints.
>>>
>>
>> Hmm, did I read it wrong and we don't have specific integration of the
>> portable API in runners? This is what is messing up the runners and
>> limiting beam adoption on existing runners.
>> Portable API is a feature buildable on top of runner, not in runners.
>> Same as a runner implementing the 5-6 primitives can run anything, the
>> portable API should just rely on that and not require more integration.
>> It doesn't prevent more deep integrations as for some higher level
>> primitives existing in runners but it is not the case today for runners so
>> shouldn't exist IMHO.
>>
>>
>>>
>>> I agree however that for the Java SDK to Java runner case this can
>>> represent additional pain, docker ideally should not be a requirement for
>>> Java users with the Direct runner and debugging a pipeline should be as
>>> easy as it is today. I think the Univerrsal Local Runner exists to cover
>>> the Portable case, but after looking at this JIRA I am not sure if
>>> unification is coming (and by consequence if docker would be mandatory).
>>> https://issues.apache.org/jira/browse/BEAM-4239
>>>
>>> I suppose for the distributed runners that they must implement the full
>>> Portability APIs to be considered Beam multi language compliant but they
>>> can prefer for performance reasons to translate without the portability
>>> APIs the Java to Java case.
>>>
>>
>>
>> This is my issue, language portability must NOT impact runners at all, it
>> is just a way to forward primitives to a runner.
>> See it as a layer rewriting the pipeline and submitting it. No need to
>> modify any runner.
>>
>>
>>> On Sat, May 5, 2018 at 9:11 AM Reuven Lax <re...@google.com> wrote:
>>>
>>> > A beam cluster with the spark runner would include a spark cluster,
>>> plus
>>> what's needed for portability, plus the beam sdk.
>>>
>>> > On Fri, May 4, 2018, 11:55 PM Romain Manni-Bucau <
>>> rmannibu...@gmail.com>
>>> wrote:
>>>
>>>
>>>
>>> >> Le 5 mai 2018 08:43, "Reuven Lax" <re...@google.com> a écrit :
>>>
>>> >> I don't believe we enforce docker anywhere. In fact if someone wanted
>>> to
>>> run an all-windows beam cluster, they would probably not use docker for
>>> their runner (docker runs on Windows, but not efficiently).
>>>
>>>
>>>
>>> >> Or doesnt run sometimes - a colleague hit that yesterday :(.
>>>
>>> >> What is a "beam cluster" - opposed to a spark or foink cluster? How
>>> would it work on windows servers?
>>>
>>>
>>> >> On Fri, May 4, 2018, 11:19 PM Romain Manni-Bucau <
>>> rmannibu...@gmail.com>
>>> wrote:
>>>
>>>
>>>
>>> >>> 2018-05-05 2:33 GMT+02:00 Andrew Pilloud <apill...@google.com>:
>>>
>>> >>>> What docker really buys is a package format and runtime environment
>>> that is language and operating system agnostic. The docker packaging and
>>> runtime format is the de facto standard for portable applications such as
>>> this, and there is a group trying to turn it into an actual standard.
>>>
>>> >>>> I would agree with you that dockerd has become bloated but there are
>>> projects that solve that. There is no longer lock-in to dockerd, there
>>> are
>>> package format compatible docker replacements that eliminate the
>>> performance issues and overhead associated with docker. CRI-O (
>>> https://github.com/kubernetes-incubator/cri-o) is a really cool RedHat
>>> project which is a minimalist replacement for docker. I was recently
>>> working at a startup where I migrated our "data mover" appliance from
>>> Docker to CRI-O. Our application was able to get direct access to the
>>> ethernet driver and block devices which enabled a huge performance boost
>>> but we were also able to run containers produced by docker without
>>> modification.
>>>
>>> >>>> You mention that docker is "detail of one runner+vendor corrupting
>>> all
>>> the project and adding complexity and work to everyone". It sounds like
>>> you
>>> have a specific example you'd like to share? Is there a runner that is
>>> unable to move to portability because of docker?
>>>
>>>
>>> >>> IBM one for instance, some custom ones like an hazelcast based one,
>>> etc... More generally any runner developped outside beam itself - even if
>>> we take a snapshot today, most of beam's ones have the same pitall.
>>>
>>> >>> Note: i never said docker was a bad techno or so. Let me try to
>>> clarify.
>>>
>>> >>> Main issue is that you enforce docker usage which is still trendy. It
>>> is like scla which was promishing to kill java, check what it does
>>> today...
>>> >>> It starts to be tooled but it is also very impacting on the
>>> deployment
>>> side and for a good number of beam users who deploy it outside the cloud
>>> it
>>> is an issue.
>>> >>> Keep in mind beam is embeddable by design, it is not a runner
>>> environment and with the docker choice it imposes some environment which
>>> is
>>> inconsistent with beam design itself and this is where this choice
>>> blocks.
>>>
>>>
>>>
>>> >>>> Andrew
>>>
>>> >>>> On Fri, May 4, 2018 at 4:32 PM Henning Rohde <hero...@google.com>
>>> wrote:
>>>
>>> >>>>> Romain,
>>>
>>> >>>>> Docker, unlike selinux, solves a great number of tangible problems
>>> for us with IMO a relatively small tax. It does not have to be the only
>>> way. Some of the concerns you bring up along with possibilities were also
>>> discussed here: https://s.apache.org/beam-fn-api-container-contract. I
>>> encourage you to take a look.
>>>
>>> >>>>> Thanks,
>>> >>>>>   Henning
>>>
>>>
>>> >>>>> On Fri, May 4, 2018 at 3:18 PM Romain Manni-Bucau <
>>> rmannibu...@gmail.com> wrote:
>>>
>>>
>>>
>>> >>>>>> Le 4 mai 2018 21:31, "Henning Rohde" <hero...@google.com> a
>>> écrit :
>>>
>>> >>>>>> I disagree with the characterization of docker and the
>>> implications
>>> made towards portability. Graal looks like a neat project (and I never
>>> thought I would live to see the phrase "Practical Partial Evaluation"
>>> ..),
>>> but it doesn't address the needs of portability. In addition to Luke's
>>> examples, Go and most other languages don't work on it either. Docker
>>> containers also address packaging, OS dependencies, conflicting versions
>>> and distribution aspects in addition to truly universal language support.
>>>
>>>
>>> >>>>>> This is wrong, docker also has its conflicts, is not universal
>>> (fails on windows and mac easily - as host or not, cloud vendors put
>>> layers
>>> limiting or corrupting it, and it is an infra constraint imposed and a
>>> vendor locking not welcomed in beam IMHO).
>>>
>>> >>>>>> This is my main concern. All the work done looks like an
>>> implemzntation detail of one runner+vendor corrupting all the project and
>>> adding complexity and work to everyone instead of keeping it localised
>>> (technically it is possible).
>>>
>>> >>>>>> Would you accept i enforce you to use selinux? Using docker is the
>>> same kind of constraint.
>>>
>>>
>>> >>>>>> That said, it's entirely fine for some runners to use Jython,
>>> Graal,
>>> etc to provide a specialized offering similar to the direct runners, but
>>> it
>>> would be disjoint from portability IMO.
>>>
>>> >>>>>> On Fri, May 4, 2018 at 10:14 AM Romain Manni-Bucau <
>>> rmannibu...@gmail.com> wrote:
>>>
>>>
>>>
>>> >>>>>>> Le 4 mai 2018 17:55, "Lukasz Cwik" <lc...@google.com> a écrit :
>>>
>>> >>>>>>> I did take a look at Graal a while back when thinking about how
>>> execution environments could be defined, my concerns were related to it
>>> not
>>> supporting all of the features of a language.
>>> >>>>>>> For example, its typical for Python to load and call native
>>> libraries and Graal can only execute C/C++ code that has been compiled to
>>> LLVM.
>>> >>>>>>> Also, a good amount of people interested in using ML libraries
>>> will
>>> want access to GPUs to improve performance which I believe that Graal
>>> can't
>>> support.
>>>
>>> >>>>>>> It can be a very useful way to run simple lamda functions written
>>> in some language directly without needing to use a docker environment but
>>> you could probably use something even lighter weight then Graal that is
>>> language specific like Jython.
>>>
>>>
>>>
>>> >>>>>>> Right, the jsr223 impl works very well but you can also have a
>>> perf
>>> boost using native (like v8 java binding for js for instance). It is way
>>> more efficient than docker most of the time and not code intrusive at all
>>> in runners so likely more adoption-able and maintainable. That said all
>>> is
>>> doable behind the jsr223 so maybe not a big deal in terms of api. We just
>>> need to ensure portability work stay clean and actually portable and
>>> doesnt
>>> impact runners as poc done until today did.
>>>
>>> >>>>>>> Works for me.
>>>
>>>
>>> >>>>>>> On Thu, May 3, 2018 at 10:05 PM Romain Manni-Bucau <
>>> rmannibu...@gmail.com> wrote:
>>>
>>> >>>>>>>> Hi guys
>>>
>>> >>>>>>>> Since some time there are efforts to have a language portable
>>> support in beam but I cant really find a case it "works" being based on
>>> docker except for some vendor specific infra.
>>>
>>> >>>>>>>> Current solution:
>>>
>>> >>>>>>>> 1. Is runner intrusive (which is bad for beam and prevents
>>> adoption of big data vendors)
>>> >>>>>>>> 2. Based on docker (which assumed a runtime environment and is
>>> very ops/infra intrusive and likely too $$ quite often for what it
>>> brings)
>>>
>>> >>>>>>>> Did anyone had a look to graal which seems a way to make the
>>> feature doable in a lighter manner and optimized compared to default
>>> jsr223
>>> impls?
>>>
>>
>>

Re: Graal instead of docker?

Reply via email to