Cool, thanks!

It seems like some good follow ups might exist to simplify things for
Python users so they don’t have to roll their own docker files (like allow
them provide a requirements.txt which is used in the dockerfile) :)

I’m really excited about the direction with the containerized runners :)

On Sat, Nov 18, 2017 at 6:12 PM Henning Rohde <[email protected]>
wrote:

> A benefit of using docker containers is that (nearly) arbitrary native
> dependencies can be installed in the container image itself by either the
> user or SDK. For example, the (minimal, in progress) Python container
> Dockerfile is here:
>
>
>
> https://github.com/apache/beam/blob/1039f5b9682fa6aa5fba256110c63caf4d0da41f/sdks/python/container/Dockerfile
>
> Any user could simply augment it with "pip install" commands, say, or use
> something else entirely (although the corresponding boot program may also
> need to change in that case). The Python SDK itself might also include
> options/scripts/etc to make common customizations easier to use to avoid
> installing them at runtime. Multiple Dockerfiles can also co-exist. For
> actually passing the container image to the runner it's a choice make by
> each SDK, which is why it's not discussed much in the portability context.
> But a uniform flag along the lines of --sdk_harness_container_image to
> include the image into the pipeline proto would seem desirable. That said,
> I don't think how all these capabilities would best be exposed to users has
> been much explored yet in any SDK.
>
> Finally, there has been several thoughts on cross-language pipelines and I
> think it's a very exciting aspect of the portability framework. A doc is
> here:
>
>    https://s.apache.org/beam-mixed-language-pipelines.
>
> It is also linked from design section in the portability page.
>
> Thanks,
>  Henning
>
>
> On Sat, Nov 18, 2017 at 6:33 AM, Holden Karau <[email protected]>
> wrote:
>
> > So I was looking through https://beam.apache.org/contribute/portability/
> > which lead me to BEAM-2900, and then to
> > https://docs.google.com/document/d/1n6s3BOxOPct3uF4UgbbI9O9rpdiKW
> > FH9R6mtVmR7xp0/edit#
> > .
> >
> > I was wondering if there is any considerations being given to native
> > dependencies that user code may have (especially things like Python
> > packages which can be super painful to deal with in a Spark cluster
> unless
> > you use one of the vendor solutions)?
> >
> > Also, and this may be a terrible idea, but has there been thought given
> to
> > the idea of a cross-language pipelines (I see these in Spark occasionally
> > but with the DL stuff happening I suspect we might see users wanting
> > cross-language functionality more often)?
> >
> > I also saw "Proposal: introduce an option to pass SDK harness container
> > image in Beam SDKs" & it seems like Robert brought up the benefits of
> using
> > Docker for Python runners, but I don't see the details on how we would
> > expose that to users it in the design docs I've found yet (which could
> very
> > well be I'm not looking at the right docs).
> >
> > Cheers,
> >
> > Holden :)
> >
> > --
> > Twitter: https://twitter.com/holdenkarau
> >
>
-- 
Twitter: https://twitter.com/holdenkarau

Reply via email to