Valentyn Tymofieiev commented on BEAM-3950:

We should start with allowing --sdk_location to point to a wheel file and 
handle that correctly, without renaming to tar. This will unblock release 
qualification of wheels, and allow users to pass wheels if they want to, 
although just that is not a very convenient user experience. 

If we want to stage a wheel by default, I think we should stage both source 
tarball and wheel(s). Then, worker container should decide what to use. It can 
try to use the wheel, if it does not work out, fall back to use tarball. 
 * Custom containers may have a platform that is incompatible with the wheel we 
choose to stage.
 * Python 3 containers may choose to install SDK from sources for sometime, 
until we start building Python3 wheels.
 * Wheels may not be immediately recognized by Dataflow worker containers, 
although this is not critical if we can wait with SDK changes.

Starting from version 2.4, Dataflow SDK should be installing it's dependency 
apache-beam[gcp] on Dataflow workers from wheels already. 

> Dataflow Runner should supply a wheel version of Python SDK if it is available
> ------------------------------------------------------------------------------
>                 Key: BEAM-3950
>                 URL: https://issues.apache.org/jira/browse/BEAM-3950
>             Project: Beam
>          Issue Type: Improvement
>          Components: runner-dataflow, sdk-py-core
>            Reporter: Valentyn Tymofieiev
>            Assignee: Valentyn Tymofieiev
>            Priority: Major

This message was sent by Atlassian JIRA

Reply via email to