alxmrs commented on issue #22349:
URL: https://github.com/apache/beam/issues/22349#issuecomment-1190836330

   Maybe, we should have a video chat soon to expedite this discussion. The 
Beam docker image, in my understanding, is to package a runtime environment for 
Beam workers (the specifics of which depend on the runner used. Most commonly, 
this is the base image for workers running a Dataflow job). If you need to use 
Beam in an interactive context like in a Jupyter notebook, why use the Docker 
image at all? Why not just install beam with pip / conda, and let users use the 
local runner? This would enable users to iterate on the pipelines themselves 
before "the dataprocessing step."
   
   If this is an accurate model of the problem you're trying to solve, then I'd 
like to make explicit the two types of runtime environments: 
   - Development time / interactive use. This could use the python Beam SDK.
   - Remote execution & deployment. This is closer to what I had in mind with 
this issue. 
   
   From here, I can see how there's a desire to create one image to handle both 
use cases. However, I bet it's better to handle each of these with their own 
docker images, maybe via a multistage build. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to