[
https://issues.apache.org/jira/browse/BEAM-6765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16788810#comment-16788810
]
Ryan Williams commented on BEAM-6765:
-------------------------------------
I'm seeing this while trying to run [this TF estimator
example|https://cloud.google.com/solutions/machine-learning/data-preprocessing-for-ml-with-tf-transform-pt2#read_raw_training_data]
([notebook|https://github.com/GoogleCloudPlatform/tf-estimator-tutorials/blob/7af539a0f4d6113986dde65abe96c9e1c7701ae0/00_Miscellaneous/tf_transform/tft-01%20-%20Babyweight%20preprocessing%20with%20tf.Transform.ipynb])
with any recent versions of Tensorflow Transform (0.12.0, 0.13.0, which depend
on Beam 0.10.0 / 0.11.0, resp., both of which depend on pyarrow 0.11.1).
Running a Beam+Dataflow job that uses TFT requires staging [source
artifacts|https://github.com/apache/beam/blob/v2.11.0/sdks/python/apache_beam/runners/portability/stager.py#L423]
for TFT and therefore pyarrow 0.11.1, but the latter don't exist. Only pyarrow
0.11.0 and 0.12.1 have published sources.
Possible solutions:
* pyarrow publish sources for 0.11.1
* Beam depend on a wider range of pyarrows (0.12.1? Too late for Beam 0.10.0 /
0.11.0)
I'm curious why you closed this [~barrywhart]; it seems like an ongoing problem
to me.
> Beam 2.10.0 for Python requires pyarrow 0.11.1, which is not installable in
> Google Cloud DataFlow
> -------------------------------------------------------------------------------------------------
>
> Key: BEAM-6765
> URL: https://issues.apache.org/jira/browse/BEAM-6765
> Project: Beam
> Issue Type: Bug
> Components: sdk-py-core
> Affects Versions: 2.10.0
> Reporter: Barry Hart
> Priority: Major
> Fix For: 2.10.0
>
>
> When trying to run a Beam 2.10.0 job in Google Cloud DataFlow, I get the
> following error:
> {noformat}
> Collecting pyarrow==0.11.1 (from -r requirements.txt (line 51))
> Could not find a version that satisfies the requirement pyarrow==0.11.1 (from
> -r requirements.txt (line 51)) (from versions: 0.9.0, 0.10.0, 0.11.0, 0.12.1)
> No matching distribution found for pyarrow==0.11.1 (from -r requirements.txt
> (line 51))
> {noformat}
> This version, while it exists, cannot be installed in Google Cloud DataFlow,
> because it is only available on PyPI as a wheel, and DataFlow does not allow
> installing binary packages, only source packages.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)