Agree The jvm is still mainstream for big data and it is trivial to have a remote facade to support natives but no point to have it in runners, it is some particular transforms or even dofn and sources only...
Le 5 mai 2018 19:03, "Andrew Pilloud" <apill...@google.com> a écrit : > Thanks for the examples earlier, I think Hazelcast is a great example of > something portability might make more difficult. I'm not working on > portability, but my understanding is that the data sent to the runner is a > blob of code and the name of the container to run it in. A runner with a > native language (java on Hazelcast for example) could run the code directly > without the container if it is in a language it supports. So when Hazelcast > sees a known java container specified, it just loads the java blob and runs > it. When it sees another container it rejects the pipeline. You could use > Graal in the Hazelcast runner to do this for a number of languages. I would > expect that this could also be done in the direct runner, which similarly > provides a native java environment, so portable Java pipelines can be > tested without docker? > > For another way to frame this: if Beam was originally written in Go, we > would be having a different discussion. A pipeline written entirely in java > wouldn't be possible, so instead to enable Hazelcast, we would have to be > able to run the java from portability without running the container. > > Andrew > > On Sat, May 5, 2018 at 1:48 AM Romain Manni-Bucau <rmannibu...@gmail.com> > wrote: > >> >> >> 2018-05-05 9:27 GMT+02:00 Ismaël Mejía <ieme...@gmail.com>: >> >>> Graal would not be a viable solution for the reasons Henning and Andrew >>> mentioned, or put in other words, when users choose a programming >>> language >>> they don’t choose only a ‘friendly’ syntax or programming model, they >>> choose also the ecosystem that comes with it, and the libraries that make >>> their life easier. However isolating these user libraries/dependencies >>> is a >>> hard problem and so far the standard solution to this problem is to use >>> operating systems containers via docker. >>> >> >> Graal solves that Ismael. Same kind of experience than running npm libs >> on nashorn but with a more unified API to run any language soft. >> >> >>> >>> The Beam vision from day zero is to run pipelines written in multiple >>> languages in runners in multiple systems, and so far we are not doing >>> this >>> in particular in the Apache runners. The portability work is the cleanest >>> way to achieve this vision given the constraints. >>> >> >> Hmm, did I read it wrong and we don't have specific integration of the >> portable API in runners? This is what is messing up the runners and >> limiting beam adoption on existing runners. >> Portable API is a feature buildable on top of runner, not in runners. >> Same as a runner implementing the 5-6 primitives can run anything, the >> portable API should just rely on that and not require more integration. >> It doesn't prevent more deep integrations as for some higher level >> primitives existing in runners but it is not the case today for runners so >> shouldn't exist IMHO. >> >> >>> >>> I agree however that for the Java SDK to Java runner case this can >>> represent additional pain, docker ideally should not be a requirement for >>> Java users with the Direct runner and debugging a pipeline should be as >>> easy as it is today. I think the Univerrsal Local Runner exists to cover >>> the Portable case, but after looking at this JIRA I am not sure if >>> unification is coming (and by consequence if docker would be mandatory). >>> https://issues.apache.org/jira/browse/BEAM-4239 >>> >>> I suppose for the distributed runners that they must implement the full >>> Portability APIs to be considered Beam multi language compliant but they >>> can prefer for performance reasons to translate without the portability >>> APIs the Java to Java case. >>> >> >> >> This is my issue, language portability must NOT impact runners at all, it >> is just a way to forward primitives to a runner. >> See it as a layer rewriting the pipeline and submitting it. No need to >> modify any runner. >> >> >>> On Sat, May 5, 2018 at 9:11 AM Reuven Lax <re...@google.com> wrote: >>> >>> > A beam cluster with the spark runner would include a spark cluster, >>> plus >>> what's needed for portability, plus the beam sdk. >>> >>> > On Fri, May 4, 2018, 11:55 PM Romain Manni-Bucau < >>> rmannibu...@gmail.com> >>> wrote: >>> >>> >>> >>> >> Le 5 mai 2018 08:43, "Reuven Lax" <re...@google.com> a écrit : >>> >>> >> I don't believe we enforce docker anywhere. In fact if someone wanted >>> to >>> run an all-windows beam cluster, they would probably not use docker for >>> their runner (docker runs on Windows, but not efficiently). >>> >>> >>> >>> >> Or doesnt run sometimes - a colleague hit that yesterday :(. >>> >>> >> What is a "beam cluster" - opposed to a spark or foink cluster? How >>> would it work on windows servers? >>> >>> >>> >> On Fri, May 4, 2018, 11:19 PM Romain Manni-Bucau < >>> rmannibu...@gmail.com> >>> wrote: >>> >>> >>> >>> >>> 2018-05-05 2:33 GMT+02:00 Andrew Pilloud <apill...@google.com>: >>> >>> >>>> What docker really buys is a package format and runtime environment >>> that is language and operating system agnostic. The docker packaging and >>> runtime format is the de facto standard for portable applications such as >>> this, and there is a group trying to turn it into an actual standard. >>> >>> >>>> I would agree with you that dockerd has become bloated but there are >>> projects that solve that. There is no longer lock-in to dockerd, there >>> are >>> package format compatible docker replacements that eliminate the >>> performance issues and overhead associated with docker. CRI-O ( >>> https://github.com/kubernetes-incubator/cri-o) is a really cool RedHat >>> project which is a minimalist replacement for docker. I was recently >>> working at a startup where I migrated our "data mover" appliance from >>> Docker to CRI-O. Our application was able to get direct access to the >>> ethernet driver and block devices which enabled a huge performance boost >>> but we were also able to run containers produced by docker without >>> modification. >>> >>> >>>> You mention that docker is "detail of one runner+vendor corrupting >>> all >>> the project and adding complexity and work to everyone". It sounds like >>> you >>> have a specific example you'd like to share? Is there a runner that is >>> unable to move to portability because of docker? >>> >>> >>> >>> IBM one for instance, some custom ones like an hazelcast based one, >>> etc... More generally any runner developped outside beam itself - even if >>> we take a snapshot today, most of beam's ones have the same pitall. >>> >>> >>> Note: i never said docker was a bad techno or so. Let me try to >>> clarify. >>> >>> >>> Main issue is that you enforce docker usage which is still trendy. It >>> is like scla which was promishing to kill java, check what it does >>> today... >>> >>> It starts to be tooled but it is also very impacting on the >>> deployment >>> side and for a good number of beam users who deploy it outside the cloud >>> it >>> is an issue. >>> >>> Keep in mind beam is embeddable by design, it is not a runner >>> environment and with the docker choice it imposes some environment which >>> is >>> inconsistent with beam design itself and this is where this choice >>> blocks. >>> >>> >>> >>> >>>> Andrew >>> >>> >>>> On Fri, May 4, 2018 at 4:32 PM Henning Rohde <hero...@google.com> >>> wrote: >>> >>> >>>>> Romain, >>> >>> >>>>> Docker, unlike selinux, solves a great number of tangible problems >>> for us with IMO a relatively small tax. It does not have to be the only >>> way. Some of the concerns you bring up along with possibilities were also >>> discussed here: https://s.apache.org/beam-fn-api-container-contract. I >>> encourage you to take a look. >>> >>> >>>>> Thanks, >>> >>>>> Henning >>> >>> >>> >>>>> On Fri, May 4, 2018 at 3:18 PM Romain Manni-Bucau < >>> rmannibu...@gmail.com> wrote: >>> >>> >>> >>> >>>>>> Le 4 mai 2018 21:31, "Henning Rohde" <hero...@google.com> a >>> écrit : >>> >>> >>>>>> I disagree with the characterization of docker and the >>> implications >>> made towards portability. Graal looks like a neat project (and I never >>> thought I would live to see the phrase "Practical Partial Evaluation" >>> ..), >>> but it doesn't address the needs of portability. In addition to Luke's >>> examples, Go and most other languages don't work on it either. Docker >>> containers also address packaging, OS dependencies, conflicting versions >>> and distribution aspects in addition to truly universal language support. >>> >>> >>> >>>>>> This is wrong, docker also has its conflicts, is not universal >>> (fails on windows and mac easily - as host or not, cloud vendors put >>> layers >>> limiting or corrupting it, and it is an infra constraint imposed and a >>> vendor locking not welcomed in beam IMHO). >>> >>> >>>>>> This is my main concern. All the work done looks like an >>> implemzntation detail of one runner+vendor corrupting all the project and >>> adding complexity and work to everyone instead of keeping it localised >>> (technically it is possible). >>> >>> >>>>>> Would you accept i enforce you to use selinux? Using docker is the >>> same kind of constraint. >>> >>> >>> >>>>>> That said, it's entirely fine for some runners to use Jython, >>> Graal, >>> etc to provide a specialized offering similar to the direct runners, but >>> it >>> would be disjoint from portability IMO. >>> >>> >>>>>> On Fri, May 4, 2018 at 10:14 AM Romain Manni-Bucau < >>> rmannibu...@gmail.com> wrote: >>> >>> >>> >>> >>>>>>> Le 4 mai 2018 17:55, "Lukasz Cwik" <lc...@google.com> a écrit : >>> >>> >>>>>>> I did take a look at Graal a while back when thinking about how >>> execution environments could be defined, my concerns were related to it >>> not >>> supporting all of the features of a language. >>> >>>>>>> For example, its typical for Python to load and call native >>> libraries and Graal can only execute C/C++ code that has been compiled to >>> LLVM. >>> >>>>>>> Also, a good amount of people interested in using ML libraries >>> will >>> want access to GPUs to improve performance which I believe that Graal >>> can't >>> support. >>> >>> >>>>>>> It can be a very useful way to run simple lamda functions written >>> in some language directly without needing to use a docker environment but >>> you could probably use something even lighter weight then Graal that is >>> language specific like Jython. >>> >>> >>> >>> >>>>>>> Right, the jsr223 impl works very well but you can also have a >>> perf >>> boost using native (like v8 java binding for js for instance). It is way >>> more efficient than docker most of the time and not code intrusive at all >>> in runners so likely more adoption-able and maintainable. That said all >>> is >>> doable behind the jsr223 so maybe not a big deal in terms of api. We just >>> need to ensure portability work stay clean and actually portable and >>> doesnt >>> impact runners as poc done until today did. >>> >>> >>>>>>> Works for me. >>> >>> >>> >>>>>>> On Thu, May 3, 2018 at 10:05 PM Romain Manni-Bucau < >>> rmannibu...@gmail.com> wrote: >>> >>> >>>>>>>> Hi guys >>> >>> >>>>>>>> Since some time there are efforts to have a language portable >>> support in beam but I cant really find a case it "works" being based on >>> docker except for some vendor specific infra. >>> >>> >>>>>>>> Current solution: >>> >>> >>>>>>>> 1. Is runner intrusive (which is bad for beam and prevents >>> adoption of big data vendors) >>> >>>>>>>> 2. Based on docker (which assumed a runtime environment and is >>> very ops/infra intrusive and likely too $$ quite often for what it >>> brings) >>> >>> >>>>>>>> Did anyone had a look to graal which seems a way to make the >>> feature doable in a lighter manner and optimized compared to default >>> jsr223 >>> impls? >>> >> >>