> For example, there could be access to a file system or other service to
fetch metadata that is required to build the pipeline.

That's a good point. It's totally up to users to decide how they want to
deploy. I just think the jar solution would provide a useful option for
many, but not all use cases.

> That requires the (matching) Java environment on the Python developers
machine.

IIUC this recent PR to fetch the job server from maven on the Python SDK
should help with that. https://github.com/apache/beam/pull/9043

Kyle Weaver | Software Engineer | github.com/ibzib | kcwea...@google.com


On Wed, Aug 7, 2019 at 8:59 AM Thomas Weise <t...@apache.org> wrote:

> -->
>
>>
>>
>> > * The pipeline construction code itself may need access to cluster
>> resources. In such cases the jar file cannot be created offline.
>>
>> Could you elaborate?
>>
>
> The entry point is arbitrary code written by the user, not limited to Beam
> pipeline construction alone. For example, there could be access to a file
> system or other service to fetch metadata that is required to build the
> pipeline. Such services can be accessed when the code runs within the
> infrastructure, but typically not in a development environment.
>
>
>> > * For k8s deployment, a container image with the SDK and application
>> code is required for the worker. The jar file (which is really a derived
>> artifact) would need to be built in addition to the container image.
>>
>> Yes. For standard use, a vanilla released Beam published SDK container
>> + staged artifacts should be sufficient.
>>
>> > * To build such jar file, the user would need a build environment with
>> job server and application code. Do we want to make that assumption?
>>
>> Actually, it's probably much easier than that. A jar file is just a
>> zip file with a standard structure, to which one can easily add (data)
>> files without having a full build environment. The (pre-compiled) main
>> class would know how to read this data to construct the pipeline and
>> kick off the job just like any other Flink job.
>>
>
> Before assembling the jar, the job server runs to create the ingredients.
> That requires the (matching) Java environment on the Python developers
> machine.
>
>

Reply via email to