[
https://issues.apache.org/jira/browse/BEAM-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16083494#comment-16083494
]
Valentyn Tymofieiev commented on BEAM-2600:
-------------------------------------------
Dataflow Java runner has workerHarnessContainerImage pipeline option, although
it is specific to Dataflow runner. I had a proposal[1] to introduce a
runner-independent option, but I came to realize we may need finer granularity
than to specify one SDK harness for pipeline. We need to specify SDK harness
separately for each component of the pipeline such as DoFn/SDK function. Beam
FnAPI vision suggests using containerized processes for running SDK harness, so
I could see sdk_harness_container_image eventually to be a param in
SdkFunctionSpec, but we'd have to clarify the specification and expectations
for SDK harness containers in Beam. The question where to put the information
about SDK harness, and how the runners will use it should not be specific to a
particular SDK language.
[1]
https://lists.apache.org/thread.html/af8bde64c4083783926781e2dd5167fc63465a0afd2849b43c017d61@%3Cdev.beam.apache.org%3E
> Artifact for Python SDK harness that can be referenced in pipeline definition
> -----------------------------------------------------------------------------
>
> Key: BEAM-2600
> URL: https://issues.apache.org/jira/browse/BEAM-2600
> Project: Beam
> Issue Type: New Feature
> Components: sdk-py
> Reporter: Kenneth Knowles
> Assignee: Ahmet Altay
> Labels: beam-python-everywhere
>
> In order to build a pipeline that invokes a Python UDF, we need to be able to
> construct something like this:
> {code}
> SdkFunctionSpec {
> environment = <python SDK harness>,
> spec = {
> urn = <python SDK pickled DoFn>,
> data = <pickled DoFn>
> }
> }
> {code}
> I could be out of date, but based on a couple of conversations I do not know
> that there exists anything we can put for "<python SDK harness>" today. For
> prototyping, it could be just a symbol that runners have to know. But
> eventually it should be something that runners can instantiate without
> knowing anything about the SDK that put it there. I imagine it may encompass
> "custom containers" eventually, though that doesn't block anything
> immediately.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)