Not sure what you mean? Can you point to a piece of code in Beam that
you're currently characterizing as "hacking" and suggest how it could be
refactored?

On Sat, May 5, 2018 at 2:06 PM Romain Manni-Bucau <rmannibu...@gmail.com>
wrote:

> All are good points.
>
> The only "?" I keep is: why beam doesnt uses its visitor api to make the
> portability transversal to all runners "mutating" the user model before
> translation? Technically it sounds easy and avoid hacking all impl. Was it
> tested and failed?
>
> Le 5 mai 2018 22:50, "Thomas Weise" <t...@apache.org> a écrit :
>
>> Docker isn't a silver bullet and may not be the best choice for all
>> environments (I'm also looking at potentially launching SDK workers in a
>> different way), but AFAIK there has not been any alternative proposal for
>> default SDK execution that can handle all of Python, Go and Java.
>>
>> Regardless of the default implementation, we should strive to keep the
>> implementation modular so users can plug in their own replacement as
>> needed. Looking at the prototype implementation, Docker comes downstream of
>> FlinkExecutableStageFunction, and it will be possible to supply a custom
>> implementation by making the translator pluggable (which I intend to work
>> on once backporting to master is complete), and possibly
>> "SDKHarnessManager" itself can also be swapped out.
>>
>> I would also prefer that for Flink and other Java based runners we retain
>> the option to inline executable stages that are in Java. I would expect a
>> good number of use cases to benefit from direct execution in the task
>> manager, and it may be good to offer the user that optimization.
>>
>> Thanks,
>> Thomas
>>
>>
>>
>> On Sat, May 5, 2018 at 12:54 PM, Eugene Kirpichov <kirpic...@google.com>
>> wrote:
>>
>>> To add on that: Romain, if you are really excited about Graal as a
>>> project, here are some constructive suggestions as to what you can do on a
>>> reasonably short timeframe:
>>> - Propose/prototype a design for writing UDFs in Beam SQL using Graal
>>> - Go through the portability-related design documents, come up with a
>>> more precise assessment of what parts are actually dependent on Docker's
>>> container format and/or on Docker itself, and propose a plan for untangling
>>> this dependency and opening the door to other mechanisms of cross-language
>>> execution
>>>
>>> On Sat, May 5, 2018 at 12:50 PM Eugene Kirpichov <kirpic...@google.com>
>>> wrote:
>>>
>>>> Graal is a very young project, currently nowhere near the level of
>>>> maturity or completeness as to be sufficient for Beam to fully bet its
>>>> portability vision on it:
>>>> - Graal currently only claims to support Java and Javascript, with Ruby
>>>> and R in the status of "some applications may run", Python support "just
>>>> beginning", and Go lacking altogether.
>>>> - Regarding existing production usage, the Graal FAQ says it is "a
>>>> project with new innovative technology in its early stages."
>>>>
>>>> That said, as Graal matures, I think it would be reasonable to keep an
>>>> eye on it as a potential future lightweight alternative to containers for
>>>> pipelines where Graal's level of support is sufficient for this particular
>>>> pipeline.
>>>>
>>>> Please also keep in mind that execution of user code is only a small
>>>> part of the overall portability picture, and dependency on Docker is an
>>>> even smaller part of that (there is only 1 mention of the word "Docker" in
>>>> all of Beam's portability protos, and the mention is in an out-of-date TODO
>>>> comment). I hope this addresses your concerns.
>>>>
>>>> On Sat, May 5, 2018 at 11:49 AM Romain Manni-Bucau <
>>>> rmannibu...@gmail.com> wrote:
>>>>
>>>>> Agree
>>>>>
>>>>> The jvm is still mainstream for big data and it is trivial to have a
>>>>> remote facade to support natives but no point to have it in runners, it is
>>>>> some particular transforms or even dofn and sources only...
>>>>>
>>>>>
>>>>> Le 5 mai 2018 19:03, "Andrew Pilloud" <apill...@google.com> a écrit :
>>>>>
>>>>>> Thanks for the examples earlier, I think Hazelcast is a great
>>>>>> example of something portability might make more difficult. I'm not 
>>>>>> working
>>>>>> on portability, but my understanding is that the data sent to the runner 
>>>>>> is
>>>>>> a blob of code and the name of the container to run it in. A runner with 
>>>>>> a
>>>>>> native language (java on Hazelcast for example) could run the code 
>>>>>> directly
>>>>>> without the container if it is in a language it supports. So when 
>>>>>> Hazelcast
>>>>>> sees a known java container specified, it just loads the java blob and 
>>>>>> runs
>>>>>> it. When it sees another container it rejects the pipeline. You could use
>>>>>> Graal in the Hazelcast runner to do this for a number of languages. I 
>>>>>> would
>>>>>> expect that this could also be done in the direct runner, which similarly
>>>>>> provides a native java environment, so portable Java pipelines can be
>>>>>> tested without docker?
>>>>>>
>>>>>> For another way to frame this: if Beam was originally written in Go,
>>>>>> we would be having a different discussion. A pipeline written entirely in
>>>>>> java wouldn't be possible, so instead to enable Hazelcast, we would have 
>>>>>> to
>>>>>> be able to run the java from portability without running the container.
>>>>>>
>>>>>> Andrew
>>>>>>
>>>>>> On Sat, May 5, 2018 at 1:48 AM Romain Manni-Bucau <
>>>>>> rmannibu...@gmail.com> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 2018-05-05 9:27 GMT+02:00 Ismaël Mejía <ieme...@gmail.com>:
>>>>>>>
>>>>>>>> Graal would not be a viable solution for the reasons Henning and
>>>>>>>> Andrew
>>>>>>>> mentioned, or put in other words, when users choose a programming
>>>>>>>> language
>>>>>>>> they don’t choose only a ‘friendly’ syntax or programming model,
>>>>>>>> they
>>>>>>>> choose also the ecosystem that comes with it, and the libraries
>>>>>>>> that make
>>>>>>>> their life easier. However isolating these user
>>>>>>>> libraries/dependencies is a
>>>>>>>> hard problem and so far the standard solution to this problem is to
>>>>>>>> use
>>>>>>>> operating systems containers via docker.
>>>>>>>>
>>>>>>>
>>>>>>> Graal solves that Ismael. Same kind of experience than running npm
>>>>>>> libs on nashorn but with a more unified API to run any language soft.
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> The Beam vision from day zero is to run pipelines written in
>>>>>>>> multiple
>>>>>>>> languages in runners in multiple systems, and so far we are not
>>>>>>>> doing this
>>>>>>>> in particular in the Apache runners. The portability work is the
>>>>>>>> cleanest
>>>>>>>> way to achieve this vision given the constraints.
>>>>>>>>
>>>>>>>
>>>>>>> Hmm, did I read it wrong and we don't have specific integration of
>>>>>>> the portable API in runners? This is what is messing up the runners and
>>>>>>> limiting beam adoption on existing runners.
>>>>>>> Portable API is a feature buildable on top of runner, not in runners.
>>>>>>> Same as a runner implementing the 5-6 primitives can run anything,
>>>>>>> the portable API should just rely on that and not require more 
>>>>>>> integration.
>>>>>>> It doesn't prevent more deep integrations as for some higher level
>>>>>>> primitives existing in runners but it is not the case today for runners 
>>>>>>> so
>>>>>>> shouldn't exist IMHO.
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> I agree however that for the Java SDK to Java runner case this can
>>>>>>>> represent additional pain, docker ideally should not be a
>>>>>>>> requirement for
>>>>>>>> Java users with the Direct runner and debugging a pipeline should
>>>>>>>> be as
>>>>>>>> easy as it is today. I think the Univerrsal Local Runner exists to
>>>>>>>> cover
>>>>>>>> the Portable case, but after looking at this JIRA I am not sure if
>>>>>>>> unification is coming (and by consequence if docker would be
>>>>>>>> mandatory).
>>>>>>>> https://issues.apache.org/jira/browse/BEAM-4239
>>>>>>>>
>>>>>>>> I suppose for the distributed runners that they must implement the
>>>>>>>> full
>>>>>>>> Portability APIs to be considered Beam multi language compliant but
>>>>>>>> they
>>>>>>>> can prefer for performance reasons to translate without the
>>>>>>>> portability
>>>>>>>> APIs the Java to Java case.
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> This is my issue, language portability must NOT impact runners at
>>>>>>> all, it is just a way to forward primitives to a runner.
>>>>>>> See it as a layer rewriting the pipeline and submitting it. No need
>>>>>>> to modify any runner.
>>>>>>>
>>>>>>>
>>>>>>>> On Sat, May 5, 2018 at 9:11 AM Reuven Lax <re...@google.com> wrote:
>>>>>>>>
>>>>>>>> > A beam cluster with the spark runner would include a spark
>>>>>>>> cluster, plus
>>>>>>>> what's needed for portability, plus the beam sdk.
>>>>>>>>
>>>>>>>> > On Fri, May 4, 2018, 11:55 PM Romain Manni-Bucau <
>>>>>>>> rmannibu...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> >> Le 5 mai 2018 08:43, "Reuven Lax" <re...@google.com> a écrit :
>>>>>>>>
>>>>>>>> >> I don't believe we enforce docker anywhere. In fact if someone
>>>>>>>> wanted to
>>>>>>>> run an all-windows beam cluster, they would probably not use docker
>>>>>>>> for
>>>>>>>> their runner (docker runs on Windows, but not efficiently).
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> >> Or doesnt run sometimes - a colleague hit that yesterday :(.
>>>>>>>>
>>>>>>>> >> What is a "beam cluster" - opposed to a spark or foink cluster?
>>>>>>>> How
>>>>>>>> would it work on windows servers?
>>>>>>>>
>>>>>>>>
>>>>>>>> >> On Fri, May 4, 2018, 11:19 PM Romain Manni-Bucau <
>>>>>>>> rmannibu...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> >>> 2018-05-05 2:33 GMT+02:00 Andrew Pilloud <apill...@google.com>:
>>>>>>>>
>>>>>>>> >>>> What docker really buys is a package format and runtime
>>>>>>>> environment
>>>>>>>> that is language and operating system agnostic. The docker
>>>>>>>> packaging and
>>>>>>>> runtime format is the de facto standard for portable applications
>>>>>>>> such as
>>>>>>>> this, and there is a group trying to turn it into an actual
>>>>>>>> standard.
>>>>>>>>
>>>>>>>> >>>> I would agree with you that dockerd has become bloated but
>>>>>>>> there are
>>>>>>>> projects that solve that. There is no longer lock-in to dockerd,
>>>>>>>> there are
>>>>>>>> package format compatible docker replacements that eliminate the
>>>>>>>> performance issues and overhead associated with docker. CRI-O (
>>>>>>>> https://github.com/kubernetes-incubator/cri-o) is a really cool
>>>>>>>> RedHat
>>>>>>>> project which is a minimalist replacement for docker. I was recently
>>>>>>>> working at a startup where I migrated our "data mover" appliance
>>>>>>>> from
>>>>>>>> Docker to CRI-O. Our application was able to get direct access to
>>>>>>>> the
>>>>>>>> ethernet driver and block devices which enabled a huge performance
>>>>>>>> boost
>>>>>>>> but we were also able to run containers produced by docker without
>>>>>>>> modification.
>>>>>>>>
>>>>>>>> >>>> You mention that docker is "detail of one runner+vendor
>>>>>>>> corrupting all
>>>>>>>> the project and adding complexity and work to everyone". It sounds
>>>>>>>> like you
>>>>>>>> have a specific example you'd like to share? Is there a runner that
>>>>>>>> is
>>>>>>>> unable to move to portability because of docker?
>>>>>>>>
>>>>>>>>
>>>>>>>> >>> IBM one for instance, some custom ones like an hazelcast based
>>>>>>>> one,
>>>>>>>> etc... More generally any runner developped outside beam itself -
>>>>>>>> even if
>>>>>>>> we take a snapshot today, most of beam's ones have the same pitall.
>>>>>>>>
>>>>>>>> >>> Note: i never said docker was a bad techno or so. Let me try to
>>>>>>>> clarify.
>>>>>>>>
>>>>>>>> >>> Main issue is that you enforce docker usage which is still
>>>>>>>> trendy. It
>>>>>>>> is like scla which was promishing to kill java, check what it does
>>>>>>>> today...
>>>>>>>> >>> It starts to be tooled but it is also very impacting on the
>>>>>>>> deployment
>>>>>>>> side and for a good number of beam users who deploy it outside the
>>>>>>>> cloud it
>>>>>>>> is an issue.
>>>>>>>> >>> Keep in mind beam is embeddable by design, it is not a runner
>>>>>>>> environment and with the docker choice it imposes some environment
>>>>>>>> which is
>>>>>>>> inconsistent with beam design itself and this is where this choice
>>>>>>>> blocks.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> >>>> Andrew
>>>>>>>>
>>>>>>>> >>>> On Fri, May 4, 2018 at 4:32 PM Henning Rohde <
>>>>>>>> hero...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> >>>>> Romain,
>>>>>>>>
>>>>>>>> >>>>> Docker, unlike selinux, solves a great number of tangible
>>>>>>>> problems
>>>>>>>> for us with IMO a relatively small tax. It does not have to be the
>>>>>>>> only
>>>>>>>> way. Some of the concerns you bring up along with possibilities
>>>>>>>> were also
>>>>>>>> discussed here: https://s.apache.org/beam-fn-api-container-contract.
>>>>>>>> I
>>>>>>>> encourage you to take a look.
>>>>>>>>
>>>>>>>> >>>>> Thanks,
>>>>>>>> >>>>>   Henning
>>>>>>>>
>>>>>>>>
>>>>>>>> >>>>> On Fri, May 4, 2018 at 3:18 PM Romain Manni-Bucau <
>>>>>>>> rmannibu...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> >>>>>> Le 4 mai 2018 21:31, "Henning Rohde" <hero...@google.com> a
>>>>>>>> écrit :
>>>>>>>>
>>>>>>>> >>>>>> I disagree with the characterization of docker and the
>>>>>>>> implications
>>>>>>>> made towards portability. Graal looks like a neat project (and I
>>>>>>>> never
>>>>>>>> thought I would live to see the phrase "Practical Partial
>>>>>>>> Evaluation" ..),
>>>>>>>> but it doesn't address the needs of portability. In addition to
>>>>>>>> Luke's
>>>>>>>> examples, Go and most other languages don't work on it either.
>>>>>>>> Docker
>>>>>>>> containers also address packaging, OS dependencies, conflicting
>>>>>>>> versions
>>>>>>>> and distribution aspects in addition to truly universal language
>>>>>>>> support.
>>>>>>>>
>>>>>>>>
>>>>>>>> >>>>>> This is wrong, docker also has its conflicts, is not
>>>>>>>> universal
>>>>>>>> (fails on windows and mac easily - as host or not, cloud vendors
>>>>>>>> put layers
>>>>>>>> limiting or corrupting it, and it is an infra constraint imposed
>>>>>>>> and a
>>>>>>>> vendor locking not welcomed in beam IMHO).
>>>>>>>>
>>>>>>>> >>>>>> This is my main concern. All the work done looks like an
>>>>>>>> implemzntation detail of one runner+vendor corrupting all the
>>>>>>>> project and
>>>>>>>> adding complexity and work to everyone instead of keeping it
>>>>>>>> localised
>>>>>>>> (technically it is possible).
>>>>>>>>
>>>>>>>> >>>>>> Would you accept i enforce you to use selinux? Using docker
>>>>>>>> is the
>>>>>>>> same kind of constraint.
>>>>>>>>
>>>>>>>>
>>>>>>>> >>>>>> That said, it's entirely fine for some runners to use
>>>>>>>> Jython, Graal,
>>>>>>>> etc to provide a specialized offering similar to the direct
>>>>>>>> runners, but it
>>>>>>>> would be disjoint from portability IMO.
>>>>>>>>
>>>>>>>> >>>>>> On Fri, May 4, 2018 at 10:14 AM Romain Manni-Bucau <
>>>>>>>> rmannibu...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> >>>>>>> Le 4 mai 2018 17:55, "Lukasz Cwik" <lc...@google.com> a
>>>>>>>> écrit :
>>>>>>>>
>>>>>>>> >>>>>>> I did take a look at Graal a while back when thinking about
>>>>>>>> how
>>>>>>>> execution environments could be defined, my concerns were related
>>>>>>>> to it not
>>>>>>>> supporting all of the features of a language.
>>>>>>>> >>>>>>> For example, its typical for Python to load and call native
>>>>>>>> libraries and Graal can only execute C/C++ code that has been
>>>>>>>> compiled to
>>>>>>>> LLVM.
>>>>>>>> >>>>>>> Also, a good amount of people interested in using ML
>>>>>>>> libraries will
>>>>>>>> want access to GPUs to improve performance which I believe that
>>>>>>>> Graal can't
>>>>>>>> support.
>>>>>>>>
>>>>>>>> >>>>>>> It can be a very useful way to run simple lamda functions
>>>>>>>> written
>>>>>>>> in some language directly without needing to use a docker
>>>>>>>> environment but
>>>>>>>> you could probably use something even lighter weight then Graal
>>>>>>>> that is
>>>>>>>> language specific like Jython.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> >>>>>>> Right, the jsr223 impl works very well but you can also
>>>>>>>> have a perf
>>>>>>>> boost using native (like v8 java binding for js for instance). It
>>>>>>>> is way
>>>>>>>> more efficient than docker most of the time and not code intrusive
>>>>>>>> at all
>>>>>>>> in runners so likely more adoption-able and maintainable. That said
>>>>>>>> all is
>>>>>>>> doable behind the jsr223 so maybe not a big deal in terms of api.
>>>>>>>> We just
>>>>>>>> need to ensure portability work stay clean and actually portable
>>>>>>>> and doesnt
>>>>>>>> impact runners as poc done until today did.
>>>>>>>>
>>>>>>>> >>>>>>> Works for me.
>>>>>>>>
>>>>>>>>
>>>>>>>> >>>>>>> On Thu, May 3, 2018 at 10:05 PM Romain Manni-Bucau <
>>>>>>>> rmannibu...@gmail.com> wrote:
>>>>>>>>
>>>>>>>> >>>>>>>> Hi guys
>>>>>>>>
>>>>>>>> >>>>>>>> Since some time there are efforts to have a language
>>>>>>>> portable
>>>>>>>> support in beam but I cant really find a case it "works" being
>>>>>>>> based on
>>>>>>>> docker except for some vendor specific infra.
>>>>>>>>
>>>>>>>> >>>>>>>> Current solution:
>>>>>>>>
>>>>>>>> >>>>>>>> 1. Is runner intrusive (which is bad for beam and prevents
>>>>>>>> adoption of big data vendors)
>>>>>>>> >>>>>>>> 2. Based on docker (which assumed a runtime environment
>>>>>>>> and is
>>>>>>>> very ops/infra intrusive and likely too $$ quite often for what it
>>>>>>>> brings)
>>>>>>>>
>>>>>>>> >>>>>>>> Did anyone had a look to graal which seems a way to make
>>>>>>>> the
>>>>>>>> feature doable in a lighter manner and optimized compared to
>>>>>>>> default jsr223
>>>>>>>> impls?
>>>>>>>>
>>>>>>>
>>>>>>>
>>

Reply via email to