alxmrs commented on issue #22349: URL: https://github.com/apache/beam/issues/22349#issuecomment-1190836330
Maybe, we should have a video chat soon to expedite this discussion. The Beam docker image, in my understanding, is to package a runtime environment for Beam workers (the specifics of which depend on the runner used. Most commonly, this is the base image for workers running a Dataflow job). If you need to use Beam in an interactive context like in a Jupyter notebook, why use the Docker image at all? Why not just install beam with pip / conda, and let users use the local runner? This would enable users to iterate on the pipelines themselves before "the dataprocessing step." If this is an accurate model of the problem you're trying to solve, then I'd like to make explicit the two types of runtime environments: - Development time / interactive use. This could use the python Beam SDK. - Remote execution & deployment. This is closer to what I had in mind with this issue. From here, I can see how there's a desire to create one image to handle both use cases. However, I bet it's better to handle each of these with their own docker images, maybe via a multistage build. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
