To add on that: Romain, if you are really excited about Graal as a project, here are some constructive suggestions as to what you can do on a reasonably short timeframe: - Propose/prototype a design for writing UDFs in Beam SQL using Graal - Go through the portability-related design documents, come up with a more precise assessment of what parts are actually dependent on Docker's container format and/or on Docker itself, and propose a plan for untangling this dependency and opening the door to other mechanisms of cross-language execution
On Sat, May 5, 2018 at 12:50 PM Eugene Kirpichov <kirpic...@google.com> wrote: > Graal is a very young project, currently nowhere near the level of > maturity or completeness as to be sufficient for Beam to fully bet its > portability vision on it: > - Graal currently only claims to support Java and Javascript, with Ruby > and R in the status of "some applications may run", Python support "just > beginning", and Go lacking altogether. > - Regarding existing production usage, the Graal FAQ says it is "a project > with new innovative technology in its early stages." > > That said, as Graal matures, I think it would be reasonable to keep an eye > on it as a potential future lightweight alternative to containers for > pipelines where Graal's level of support is sufficient for this particular > pipeline. > > Please also keep in mind that execution of user code is only a small part > of the overall portability picture, and dependency on Docker is an even > smaller part of that (there is only 1 mention of the word "Docker" in all > of Beam's portability protos, and the mention is in an out-of-date TODO > comment). I hope this addresses your concerns. > > On Sat, May 5, 2018 at 11:49 AM Romain Manni-Bucau <rmannibu...@gmail.com> > wrote: > >> Agree >> >> The jvm is still mainstream for big data and it is trivial to have a >> remote facade to support natives but no point to have it in runners, it is >> some particular transforms or even dofn and sources only... >> >> >> Le 5 mai 2018 19:03, "Andrew Pilloud" <apill...@google.com> a écrit : >> >>> Thanks for the examples earlier, I think Hazelcast is a great example >>> of something portability might make more difficult. I'm not working on >>> portability, but my understanding is that the data sent to the runner is a >>> blob of code and the name of the container to run it in. A runner with a >>> native language (java on Hazelcast for example) could run the code directly >>> without the container if it is in a language it supports. So when Hazelcast >>> sees a known java container specified, it just loads the java blob and runs >>> it. When it sees another container it rejects the pipeline. You could use >>> Graal in the Hazelcast runner to do this for a number of languages. I would >>> expect that this could also be done in the direct runner, which similarly >>> provides a native java environment, so portable Java pipelines can be >>> tested without docker? >>> >>> For another way to frame this: if Beam was originally written in Go, we >>> would be having a different discussion. A pipeline written entirely in java >>> wouldn't be possible, so instead to enable Hazelcast, we would have to be >>> able to run the java from portability without running the container. >>> >>> Andrew >>> >>> On Sat, May 5, 2018 at 1:48 AM Romain Manni-Bucau <rmannibu...@gmail.com> >>> wrote: >>> >>>> >>>> >>>> 2018-05-05 9:27 GMT+02:00 Ismaël Mejía <ieme...@gmail.com>: >>>> >>>>> Graal would not be a viable solution for the reasons Henning and Andrew >>>>> mentioned, or put in other words, when users choose a programming >>>>> language >>>>> they don’t choose only a ‘friendly’ syntax or programming model, they >>>>> choose also the ecosystem that comes with it, and the libraries that >>>>> make >>>>> their life easier. However isolating these user libraries/dependencies >>>>> is a >>>>> hard problem and so far the standard solution to this problem is to use >>>>> operating systems containers via docker. >>>>> >>>> >>>> Graal solves that Ismael. Same kind of experience than running npm libs >>>> on nashorn but with a more unified API to run any language soft. >>>> >>>> >>>>> >>>>> The Beam vision from day zero is to run pipelines written in multiple >>>>> languages in runners in multiple systems, and so far we are not doing >>>>> this >>>>> in particular in the Apache runners. The portability work is the >>>>> cleanest >>>>> way to achieve this vision given the constraints. >>>>> >>>> >>>> Hmm, did I read it wrong and we don't have specific integration of the >>>> portable API in runners? This is what is messing up the runners and >>>> limiting beam adoption on existing runners. >>>> Portable API is a feature buildable on top of runner, not in runners. >>>> Same as a runner implementing the 5-6 primitives can run anything, the >>>> portable API should just rely on that and not require more integration. >>>> It doesn't prevent more deep integrations as for some higher level >>>> primitives existing in runners but it is not the case today for runners so >>>> shouldn't exist IMHO. >>>> >>>> >>>>> >>>>> I agree however that for the Java SDK to Java runner case this can >>>>> represent additional pain, docker ideally should not be a requirement >>>>> for >>>>> Java users with the Direct runner and debugging a pipeline should be as >>>>> easy as it is today. I think the Univerrsal Local Runner exists to >>>>> cover >>>>> the Portable case, but after looking at this JIRA I am not sure if >>>>> unification is coming (and by consequence if docker would be >>>>> mandatory). >>>>> https://issues.apache.org/jira/browse/BEAM-4239 >>>>> >>>>> I suppose for the distributed runners that they must implement the full >>>>> Portability APIs to be considered Beam multi language compliant but >>>>> they >>>>> can prefer for performance reasons to translate without the portability >>>>> APIs the Java to Java case. >>>>> >>>> >>>> >>>> This is my issue, language portability must NOT impact runners at all, >>>> it is just a way to forward primitives to a runner. >>>> See it as a layer rewriting the pipeline and submitting it. No need to >>>> modify any runner. >>>> >>>> >>>>> On Sat, May 5, 2018 at 9:11 AM Reuven Lax <re...@google.com> wrote: >>>>> >>>>> > A beam cluster with the spark runner would include a spark cluster, >>>>> plus >>>>> what's needed for portability, plus the beam sdk. >>>>> >>>>> > On Fri, May 4, 2018, 11:55 PM Romain Manni-Bucau < >>>>> rmannibu...@gmail.com> >>>>> wrote: >>>>> >>>>> >>>>> >>>>> >> Le 5 mai 2018 08:43, "Reuven Lax" <re...@google.com> a écrit : >>>>> >>>>> >> I don't believe we enforce docker anywhere. In fact if someone >>>>> wanted to >>>>> run an all-windows beam cluster, they would probably not use docker for >>>>> their runner (docker runs on Windows, but not efficiently). >>>>> >>>>> >>>>> >>>>> >> Or doesnt run sometimes - a colleague hit that yesterday :(. >>>>> >>>>> >> What is a "beam cluster" - opposed to a spark or foink cluster? How >>>>> would it work on windows servers? >>>>> >>>>> >>>>> >> On Fri, May 4, 2018, 11:19 PM Romain Manni-Bucau < >>>>> rmannibu...@gmail.com> >>>>> wrote: >>>>> >>>>> >>>>> >>>>> >>> 2018-05-05 2:33 GMT+02:00 Andrew Pilloud <apill...@google.com>: >>>>> >>>>> >>>> What docker really buys is a package format and runtime >>>>> environment >>>>> that is language and operating system agnostic. The docker packaging >>>>> and >>>>> runtime format is the de facto standard for portable applications such >>>>> as >>>>> this, and there is a group trying to turn it into an actual standard. >>>>> >>>>> >>>> I would agree with you that dockerd has become bloated but there >>>>> are >>>>> projects that solve that. There is no longer lock-in to dockerd, there >>>>> are >>>>> package format compatible docker replacements that eliminate the >>>>> performance issues and overhead associated with docker. CRI-O ( >>>>> https://github.com/kubernetes-incubator/cri-o) is a really cool RedHat >>>>> project which is a minimalist replacement for docker. I was recently >>>>> working at a startup where I migrated our "data mover" appliance from >>>>> Docker to CRI-O. Our application was able to get direct access to the >>>>> ethernet driver and block devices which enabled a huge performance >>>>> boost >>>>> but we were also able to run containers produced by docker without >>>>> modification. >>>>> >>>>> >>>> You mention that docker is "detail of one runner+vendor >>>>> corrupting all >>>>> the project and adding complexity and work to everyone". It sounds >>>>> like you >>>>> have a specific example you'd like to share? Is there a runner that is >>>>> unable to move to portability because of docker? >>>>> >>>>> >>>>> >>> IBM one for instance, some custom ones like an hazelcast based one, >>>>> etc... More generally any runner developped outside beam itself - even >>>>> if >>>>> we take a snapshot today, most of beam's ones have the same pitall. >>>>> >>>>> >>> Note: i never said docker was a bad techno or so. Let me try to >>>>> clarify. >>>>> >>>>> >>> Main issue is that you enforce docker usage which is still trendy. >>>>> It >>>>> is like scla which was promishing to kill java, check what it does >>>>> today... >>>>> >>> It starts to be tooled but it is also very impacting on the >>>>> deployment >>>>> side and for a good number of beam users who deploy it outside the >>>>> cloud it >>>>> is an issue. >>>>> >>> Keep in mind beam is embeddable by design, it is not a runner >>>>> environment and with the docker choice it imposes some environment >>>>> which is >>>>> inconsistent with beam design itself and this is where this choice >>>>> blocks. >>>>> >>>>> >>>>> >>>>> >>>> Andrew >>>>> >>>>> >>>> On Fri, May 4, 2018 at 4:32 PM Henning Rohde <hero...@google.com> >>>>> wrote: >>>>> >>>>> >>>>> Romain, >>>>> >>>>> >>>>> Docker, unlike selinux, solves a great number of tangible >>>>> problems >>>>> for us with IMO a relatively small tax. It does not have to be the only >>>>> way. Some of the concerns you bring up along with possibilities were >>>>> also >>>>> discussed here: https://s.apache.org/beam-fn-api-container-contract. I >>>>> encourage you to take a look. >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> Henning >>>>> >>>>> >>>>> >>>>> On Fri, May 4, 2018 at 3:18 PM Romain Manni-Bucau < >>>>> rmannibu...@gmail.com> wrote: >>>>> >>>>> >>>>> >>>>> >>>>>> Le 4 mai 2018 21:31, "Henning Rohde" <hero...@google.com> a >>>>> écrit : >>>>> >>>>> >>>>>> I disagree with the characterization of docker and the >>>>> implications >>>>> made towards portability. Graal looks like a neat project (and I never >>>>> thought I would live to see the phrase "Practical Partial Evaluation" >>>>> ..), >>>>> but it doesn't address the needs of portability. In addition to Luke's >>>>> examples, Go and most other languages don't work on it either. Docker >>>>> containers also address packaging, OS dependencies, conflicting >>>>> versions >>>>> and distribution aspects in addition to truly universal language >>>>> support. >>>>> >>>>> >>>>> >>>>>> This is wrong, docker also has its conflicts, is not universal >>>>> (fails on windows and mac easily - as host or not, cloud vendors put >>>>> layers >>>>> limiting or corrupting it, and it is an infra constraint imposed and a >>>>> vendor locking not welcomed in beam IMHO). >>>>> >>>>> >>>>>> This is my main concern. All the work done looks like an >>>>> implemzntation detail of one runner+vendor corrupting all the project >>>>> and >>>>> adding complexity and work to everyone instead of keeping it localised >>>>> (technically it is possible). >>>>> >>>>> >>>>>> Would you accept i enforce you to use selinux? Using docker is >>>>> the >>>>> same kind of constraint. >>>>> >>>>> >>>>> >>>>>> That said, it's entirely fine for some runners to use Jython, >>>>> Graal, >>>>> etc to provide a specialized offering similar to the direct runners, >>>>> but it >>>>> would be disjoint from portability IMO. >>>>> >>>>> >>>>>> On Fri, May 4, 2018 at 10:14 AM Romain Manni-Bucau < >>>>> rmannibu...@gmail.com> wrote: >>>>> >>>>> >>>>> >>>>> >>>>>>> Le 4 mai 2018 17:55, "Lukasz Cwik" <lc...@google.com> a écrit >>>>> : >>>>> >>>>> >>>>>>> I did take a look at Graal a while back when thinking about how >>>>> execution environments could be defined, my concerns were related to >>>>> it not >>>>> supporting all of the features of a language. >>>>> >>>>>>> For example, its typical for Python to load and call native >>>>> libraries and Graal can only execute C/C++ code that has been compiled >>>>> to >>>>> LLVM. >>>>> >>>>>>> Also, a good amount of people interested in using ML libraries >>>>> will >>>>> want access to GPUs to improve performance which I believe that Graal >>>>> can't >>>>> support. >>>>> >>>>> >>>>>>> It can be a very useful way to run simple lamda functions >>>>> written >>>>> in some language directly without needing to use a docker environment >>>>> but >>>>> you could probably use something even lighter weight then Graal that is >>>>> language specific like Jython. >>>>> >>>>> >>>>> >>>>> >>>>>>> Right, the jsr223 impl works very well but you can also have a >>>>> perf >>>>> boost using native (like v8 java binding for js for instance). It is >>>>> way >>>>> more efficient than docker most of the time and not code intrusive at >>>>> all >>>>> in runners so likely more adoption-able and maintainable. That said >>>>> all is >>>>> doable behind the jsr223 so maybe not a big deal in terms of api. We >>>>> just >>>>> need to ensure portability work stay clean and actually portable and >>>>> doesnt >>>>> impact runners as poc done until today did. >>>>> >>>>> >>>>>>> Works for me. >>>>> >>>>> >>>>> >>>>>>> On Thu, May 3, 2018 at 10:05 PM Romain Manni-Bucau < >>>>> rmannibu...@gmail.com> wrote: >>>>> >>>>> >>>>>>>> Hi guys >>>>> >>>>> >>>>>>>> Since some time there are efforts to have a language portable >>>>> support in beam but I cant really find a case it "works" being based on >>>>> docker except for some vendor specific infra. >>>>> >>>>> >>>>>>>> Current solution: >>>>> >>>>> >>>>>>>> 1. Is runner intrusive (which is bad for beam and prevents >>>>> adoption of big data vendors) >>>>> >>>>>>>> 2. Based on docker (which assumed a runtime environment and is >>>>> very ops/infra intrusive and likely too $$ quite often for what it >>>>> brings) >>>>> >>>>> >>>>>>>> Did anyone had a look to graal which seems a way to make the >>>>> feature doable in a lighter manner and optimized compared to default >>>>> jsr223 >>>>> impls? >>>>> >>>> >>>>