Not sure what you mean? Can you point to a piece of code in Beam that you're currently characterizing as "hacking" and suggest how it could be refactored?
On Sat, May 5, 2018 at 2:06 PM Romain Manni-Bucau <rmannibu...@gmail.com> wrote: > All are good points. > > The only "?" I keep is: why beam doesnt uses its visitor api to make the > portability transversal to all runners "mutating" the user model before > translation? Technically it sounds easy and avoid hacking all impl. Was it > tested and failed? > > Le 5 mai 2018 22:50, "Thomas Weise" <t...@apache.org> a écrit : > >> Docker isn't a silver bullet and may not be the best choice for all >> environments (I'm also looking at potentially launching SDK workers in a >> different way), but AFAIK there has not been any alternative proposal for >> default SDK execution that can handle all of Python, Go and Java. >> >> Regardless of the default implementation, we should strive to keep the >> implementation modular so users can plug in their own replacement as >> needed. Looking at the prototype implementation, Docker comes downstream of >> FlinkExecutableStageFunction, and it will be possible to supply a custom >> implementation by making the translator pluggable (which I intend to work >> on once backporting to master is complete), and possibly >> "SDKHarnessManager" itself can also be swapped out. >> >> I would also prefer that for Flink and other Java based runners we retain >> the option to inline executable stages that are in Java. I would expect a >> good number of use cases to benefit from direct execution in the task >> manager, and it may be good to offer the user that optimization. >> >> Thanks, >> Thomas >> >> >> >> On Sat, May 5, 2018 at 12:54 PM, Eugene Kirpichov <kirpic...@google.com> >> wrote: >> >>> To add on that: Romain, if you are really excited about Graal as a >>> project, here are some constructive suggestions as to what you can do on a >>> reasonably short timeframe: >>> - Propose/prototype a design for writing UDFs in Beam SQL using Graal >>> - Go through the portability-related design documents, come up with a >>> more precise assessment of what parts are actually dependent on Docker's >>> container format and/or on Docker itself, and propose a plan for untangling >>> this dependency and opening the door to other mechanisms of cross-language >>> execution >>> >>> On Sat, May 5, 2018 at 12:50 PM Eugene Kirpichov <kirpic...@google.com> >>> wrote: >>> >>>> Graal is a very young project, currently nowhere near the level of >>>> maturity or completeness as to be sufficient for Beam to fully bet its >>>> portability vision on it: >>>> - Graal currently only claims to support Java and Javascript, with Ruby >>>> and R in the status of "some applications may run", Python support "just >>>> beginning", and Go lacking altogether. >>>> - Regarding existing production usage, the Graal FAQ says it is "a >>>> project with new innovative technology in its early stages." >>>> >>>> That said, as Graal matures, I think it would be reasonable to keep an >>>> eye on it as a potential future lightweight alternative to containers for >>>> pipelines where Graal's level of support is sufficient for this particular >>>> pipeline. >>>> >>>> Please also keep in mind that execution of user code is only a small >>>> part of the overall portability picture, and dependency on Docker is an >>>> even smaller part of that (there is only 1 mention of the word "Docker" in >>>> all of Beam's portability protos, and the mention is in an out-of-date TODO >>>> comment). I hope this addresses your concerns. >>>> >>>> On Sat, May 5, 2018 at 11:49 AM Romain Manni-Bucau < >>>> rmannibu...@gmail.com> wrote: >>>> >>>>> Agree >>>>> >>>>> The jvm is still mainstream for big data and it is trivial to have a >>>>> remote facade to support natives but no point to have it in runners, it is >>>>> some particular transforms or even dofn and sources only... >>>>> >>>>> >>>>> Le 5 mai 2018 19:03, "Andrew Pilloud" <apill...@google.com> a écrit : >>>>> >>>>>> Thanks for the examples earlier, I think Hazelcast is a great >>>>>> example of something portability might make more difficult. I'm not >>>>>> working >>>>>> on portability, but my understanding is that the data sent to the runner >>>>>> is >>>>>> a blob of code and the name of the container to run it in. A runner with >>>>>> a >>>>>> native language (java on Hazelcast for example) could run the code >>>>>> directly >>>>>> without the container if it is in a language it supports. So when >>>>>> Hazelcast >>>>>> sees a known java container specified, it just loads the java blob and >>>>>> runs >>>>>> it. When it sees another container it rejects the pipeline. You could use >>>>>> Graal in the Hazelcast runner to do this for a number of languages. I >>>>>> would >>>>>> expect that this could also be done in the direct runner, which similarly >>>>>> provides a native java environment, so portable Java pipelines can be >>>>>> tested without docker? >>>>>> >>>>>> For another way to frame this: if Beam was originally written in Go, >>>>>> we would be having a different discussion. A pipeline written entirely in >>>>>> java wouldn't be possible, so instead to enable Hazelcast, we would have >>>>>> to >>>>>> be able to run the java from portability without running the container. >>>>>> >>>>>> Andrew >>>>>> >>>>>> On Sat, May 5, 2018 at 1:48 AM Romain Manni-Bucau < >>>>>> rmannibu...@gmail.com> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> 2018-05-05 9:27 GMT+02:00 Ismaël Mejía <ieme...@gmail.com>: >>>>>>> >>>>>>>> Graal would not be a viable solution for the reasons Henning and >>>>>>>> Andrew >>>>>>>> mentioned, or put in other words, when users choose a programming >>>>>>>> language >>>>>>>> they don’t choose only a ‘friendly’ syntax or programming model, >>>>>>>> they >>>>>>>> choose also the ecosystem that comes with it, and the libraries >>>>>>>> that make >>>>>>>> their life easier. However isolating these user >>>>>>>> libraries/dependencies is a >>>>>>>> hard problem and so far the standard solution to this problem is to >>>>>>>> use >>>>>>>> operating systems containers via docker. >>>>>>>> >>>>>>> >>>>>>> Graal solves that Ismael. Same kind of experience than running npm >>>>>>> libs on nashorn but with a more unified API to run any language soft. >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> The Beam vision from day zero is to run pipelines written in >>>>>>>> multiple >>>>>>>> languages in runners in multiple systems, and so far we are not >>>>>>>> doing this >>>>>>>> in particular in the Apache runners. The portability work is the >>>>>>>> cleanest >>>>>>>> way to achieve this vision given the constraints. >>>>>>>> >>>>>>> >>>>>>> Hmm, did I read it wrong and we don't have specific integration of >>>>>>> the portable API in runners? This is what is messing up the runners and >>>>>>> limiting beam adoption on existing runners. >>>>>>> Portable API is a feature buildable on top of runner, not in runners. >>>>>>> Same as a runner implementing the 5-6 primitives can run anything, >>>>>>> the portable API should just rely on that and not require more >>>>>>> integration. >>>>>>> It doesn't prevent more deep integrations as for some higher level >>>>>>> primitives existing in runners but it is not the case today for runners >>>>>>> so >>>>>>> shouldn't exist IMHO. >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> I agree however that for the Java SDK to Java runner case this can >>>>>>>> represent additional pain, docker ideally should not be a >>>>>>>> requirement for >>>>>>>> Java users with the Direct runner and debugging a pipeline should >>>>>>>> be as >>>>>>>> easy as it is today. I think the Univerrsal Local Runner exists to >>>>>>>> cover >>>>>>>> the Portable case, but after looking at this JIRA I am not sure if >>>>>>>> unification is coming (and by consequence if docker would be >>>>>>>> mandatory). >>>>>>>> https://issues.apache.org/jira/browse/BEAM-4239 >>>>>>>> >>>>>>>> I suppose for the distributed runners that they must implement the >>>>>>>> full >>>>>>>> Portability APIs to be considered Beam multi language compliant but >>>>>>>> they >>>>>>>> can prefer for performance reasons to translate without the >>>>>>>> portability >>>>>>>> APIs the Java to Java case. >>>>>>>> >>>>>>> >>>>>>> >>>>>>> This is my issue, language portability must NOT impact runners at >>>>>>> all, it is just a way to forward primitives to a runner. >>>>>>> See it as a layer rewriting the pipeline and submitting it. No need >>>>>>> to modify any runner. >>>>>>> >>>>>>> >>>>>>>> On Sat, May 5, 2018 at 9:11 AM Reuven Lax <re...@google.com> wrote: >>>>>>>> >>>>>>>> > A beam cluster with the spark runner would include a spark >>>>>>>> cluster, plus >>>>>>>> what's needed for portability, plus the beam sdk. >>>>>>>> >>>>>>>> > On Fri, May 4, 2018, 11:55 PM Romain Manni-Bucau < >>>>>>>> rmannibu...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >> Le 5 mai 2018 08:43, "Reuven Lax" <re...@google.com> a écrit : >>>>>>>> >>>>>>>> >> I don't believe we enforce docker anywhere. In fact if someone >>>>>>>> wanted to >>>>>>>> run an all-windows beam cluster, they would probably not use docker >>>>>>>> for >>>>>>>> their runner (docker runs on Windows, but not efficiently). >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >> Or doesnt run sometimes - a colleague hit that yesterday :(. >>>>>>>> >>>>>>>> >> What is a "beam cluster" - opposed to a spark or foink cluster? >>>>>>>> How >>>>>>>> would it work on windows servers? >>>>>>>> >>>>>>>> >>>>>>>> >> On Fri, May 4, 2018, 11:19 PM Romain Manni-Bucau < >>>>>>>> rmannibu...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>> 2018-05-05 2:33 GMT+02:00 Andrew Pilloud <apill...@google.com>: >>>>>>>> >>>>>>>> >>>> What docker really buys is a package format and runtime >>>>>>>> environment >>>>>>>> that is language and operating system agnostic. The docker >>>>>>>> packaging and >>>>>>>> runtime format is the de facto standard for portable applications >>>>>>>> such as >>>>>>>> this, and there is a group trying to turn it into an actual >>>>>>>> standard. >>>>>>>> >>>>>>>> >>>> I would agree with you that dockerd has become bloated but >>>>>>>> there are >>>>>>>> projects that solve that. There is no longer lock-in to dockerd, >>>>>>>> there are >>>>>>>> package format compatible docker replacements that eliminate the >>>>>>>> performance issues and overhead associated with docker. CRI-O ( >>>>>>>> https://github.com/kubernetes-incubator/cri-o) is a really cool >>>>>>>> RedHat >>>>>>>> project which is a minimalist replacement for docker. I was recently >>>>>>>> working at a startup where I migrated our "data mover" appliance >>>>>>>> from >>>>>>>> Docker to CRI-O. Our application was able to get direct access to >>>>>>>> the >>>>>>>> ethernet driver and block devices which enabled a huge performance >>>>>>>> boost >>>>>>>> but we were also able to run containers produced by docker without >>>>>>>> modification. >>>>>>>> >>>>>>>> >>>> You mention that docker is "detail of one runner+vendor >>>>>>>> corrupting all >>>>>>>> the project and adding complexity and work to everyone". It sounds >>>>>>>> like you >>>>>>>> have a specific example you'd like to share? Is there a runner that >>>>>>>> is >>>>>>>> unable to move to portability because of docker? >>>>>>>> >>>>>>>> >>>>>>>> >>> IBM one for instance, some custom ones like an hazelcast based >>>>>>>> one, >>>>>>>> etc... More generally any runner developped outside beam itself - >>>>>>>> even if >>>>>>>> we take a snapshot today, most of beam's ones have the same pitall. >>>>>>>> >>>>>>>> >>> Note: i never said docker was a bad techno or so. Let me try to >>>>>>>> clarify. >>>>>>>> >>>>>>>> >>> Main issue is that you enforce docker usage which is still >>>>>>>> trendy. It >>>>>>>> is like scla which was promishing to kill java, check what it does >>>>>>>> today... >>>>>>>> >>> It starts to be tooled but it is also very impacting on the >>>>>>>> deployment >>>>>>>> side and for a good number of beam users who deploy it outside the >>>>>>>> cloud it >>>>>>>> is an issue. >>>>>>>> >>> Keep in mind beam is embeddable by design, it is not a runner >>>>>>>> environment and with the docker choice it imposes some environment >>>>>>>> which is >>>>>>>> inconsistent with beam design itself and this is where this choice >>>>>>>> blocks. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>> Andrew >>>>>>>> >>>>>>>> >>>> On Fri, May 4, 2018 at 4:32 PM Henning Rohde < >>>>>>>> hero...@google.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>> >>>>> Romain, >>>>>>>> >>>>>>>> >>>>> Docker, unlike selinux, solves a great number of tangible >>>>>>>> problems >>>>>>>> for us with IMO a relatively small tax. It does not have to be the >>>>>>>> only >>>>>>>> way. Some of the concerns you bring up along with possibilities >>>>>>>> were also >>>>>>>> discussed here: https://s.apache.org/beam-fn-api-container-contract. >>>>>>>> I >>>>>>>> encourage you to take a look. >>>>>>>> >>>>>>>> >>>>> Thanks, >>>>>>>> >>>>> Henning >>>>>>>> >>>>>>>> >>>>>>>> >>>>> On Fri, May 4, 2018 at 3:18 PM Romain Manni-Bucau < >>>>>>>> rmannibu...@gmail.com> wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>> Le 4 mai 2018 21:31, "Henning Rohde" <hero...@google.com> a >>>>>>>> écrit : >>>>>>>> >>>>>>>> >>>>>> I disagree with the characterization of docker and the >>>>>>>> implications >>>>>>>> made towards portability. Graal looks like a neat project (and I >>>>>>>> never >>>>>>>> thought I would live to see the phrase "Practical Partial >>>>>>>> Evaluation" ..), >>>>>>>> but it doesn't address the needs of portability. In addition to >>>>>>>> Luke's >>>>>>>> examples, Go and most other languages don't work on it either. >>>>>>>> Docker >>>>>>>> containers also address packaging, OS dependencies, conflicting >>>>>>>> versions >>>>>>>> and distribution aspects in addition to truly universal language >>>>>>>> support. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>> This is wrong, docker also has its conflicts, is not >>>>>>>> universal >>>>>>>> (fails on windows and mac easily - as host or not, cloud vendors >>>>>>>> put layers >>>>>>>> limiting or corrupting it, and it is an infra constraint imposed >>>>>>>> and a >>>>>>>> vendor locking not welcomed in beam IMHO). >>>>>>>> >>>>>>>> >>>>>> This is my main concern. All the work done looks like an >>>>>>>> implemzntation detail of one runner+vendor corrupting all the >>>>>>>> project and >>>>>>>> adding complexity and work to everyone instead of keeping it >>>>>>>> localised >>>>>>>> (technically it is possible). >>>>>>>> >>>>>>>> >>>>>> Would you accept i enforce you to use selinux? Using docker >>>>>>>> is the >>>>>>>> same kind of constraint. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>> That said, it's entirely fine for some runners to use >>>>>>>> Jython, Graal, >>>>>>>> etc to provide a specialized offering similar to the direct >>>>>>>> runners, but it >>>>>>>> would be disjoint from portability IMO. >>>>>>>> >>>>>>>> >>>>>> On Fri, May 4, 2018 at 10:14 AM Romain Manni-Bucau < >>>>>>>> rmannibu...@gmail.com> wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> Le 4 mai 2018 17:55, "Lukasz Cwik" <lc...@google.com> a >>>>>>>> écrit : >>>>>>>> >>>>>>>> >>>>>>> I did take a look at Graal a while back when thinking about >>>>>>>> how >>>>>>>> execution environments could be defined, my concerns were related >>>>>>>> to it not >>>>>>>> supporting all of the features of a language. >>>>>>>> >>>>>>> For example, its typical for Python to load and call native >>>>>>>> libraries and Graal can only execute C/C++ code that has been >>>>>>>> compiled to >>>>>>>> LLVM. >>>>>>>> >>>>>>> Also, a good amount of people interested in using ML >>>>>>>> libraries will >>>>>>>> want access to GPUs to improve performance which I believe that >>>>>>>> Graal can't >>>>>>>> support. >>>>>>>> >>>>>>>> >>>>>>> It can be a very useful way to run simple lamda functions >>>>>>>> written >>>>>>>> in some language directly without needing to use a docker >>>>>>>> environment but >>>>>>>> you could probably use something even lighter weight then Graal >>>>>>>> that is >>>>>>>> language specific like Jython. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> Right, the jsr223 impl works very well but you can also >>>>>>>> have a perf >>>>>>>> boost using native (like v8 java binding for js for instance). It >>>>>>>> is way >>>>>>>> more efficient than docker most of the time and not code intrusive >>>>>>>> at all >>>>>>>> in runners so likely more adoption-able and maintainable. That said >>>>>>>> all is >>>>>>>> doable behind the jsr223 so maybe not a big deal in terms of api. >>>>>>>> We just >>>>>>>> need to ensure portability work stay clean and actually portable >>>>>>>> and doesnt >>>>>>>> impact runners as poc done until today did. >>>>>>>> >>>>>>>> >>>>>>> Works for me. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> On Thu, May 3, 2018 at 10:05 PM Romain Manni-Bucau < >>>>>>>> rmannibu...@gmail.com> wrote: >>>>>>>> >>>>>>>> >>>>>>>> Hi guys >>>>>>>> >>>>>>>> >>>>>>>> Since some time there are efforts to have a language >>>>>>>> portable >>>>>>>> support in beam but I cant really find a case it "works" being >>>>>>>> based on >>>>>>>> docker except for some vendor specific infra. >>>>>>>> >>>>>>>> >>>>>>>> Current solution: >>>>>>>> >>>>>>>> >>>>>>>> 1. Is runner intrusive (which is bad for beam and prevents >>>>>>>> adoption of big data vendors) >>>>>>>> >>>>>>>> 2. Based on docker (which assumed a runtime environment >>>>>>>> and is >>>>>>>> very ops/infra intrusive and likely too $$ quite often for what it >>>>>>>> brings) >>>>>>>> >>>>>>>> >>>>>>>> Did anyone had a look to graal which seems a way to make >>>>>>>> the >>>>>>>> feature doable in a lighter manner and optimized compared to >>>>>>>> default jsr223 >>>>>>>> impls? >>>>>>>> >>>>>>> >>>>>>> >>