[
https://issues.apache.org/jira/browse/BEAM-1790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15939925#comment-15939925
]
Mike Lambert commented on BEAM-1790:
------------------------------------
I believe it is execution phase, this was when it was downloading all the
packages to build a "source" repo to upload to the server:
{noformat}
INFO:root:Starting GCS upload to
gs://dancedeets-hrd.appspot.com/staging/beamapp-lambert-0324064754-553331.1490338074.553460/requirements.txt...
INFO:root:Completed GCS upload to
gs://dancedeets-hrd.appspot.com/staging/beamapp-lambert-0324064754-553331.1490338074.553460/requirements.txt
INFO:root:Executing command: ['/usr/local/opt/python/bin/python2.7', '-m',
'pip', 'install', '--download',
'/var/folders/94/wngs1jw91_n2_jjjrfljtqrc0000gn/T/dataflow-requirements-cache',
'-r', 'requirements.txt', '--no-binary', ':all:']
DEPRECATION: pip install --download has been deprecated and will be removed in
the future. Pip now has a download command that should be used instead.
Collecting google-cloud-datastore (from -r requirements.txt (line 1))
File was already downloaded
/var/folders/94/wngs1jw91_n2_jjjrfljtqrc0000gn/T/dataflow-requirements-cache/google-cloud-datastore-0.23.0.tar.gz
...
ollecting proto-google-cloud-datastore-v1[grpc]<0.91dev,>=0.90.3 (from
gapic-google-cloud-datastore-v1<0.16dev,>=0.15.0->google-cloud-datastore->-r
requirements.txt (line 1))
File was already downloaded
/var/folders/94/wngs1jw91_n2_jjjrfljtqrc0000gn/T/dataflow-requirements-cache/proto-google-cloud-datastore-v1-0.90.3.tar.gz
Collecting setuptools (from
protobuf>=3.0.0->google-cloud-core<0.24dev,>=0.23.1->google-cloud-datastore->-r
requirements.txt (line 1))
File was already downloaded
/var/folders/94/wngs1jw91_n2_jjjrfljtqrc0000gn/T/dataflow-requirements-cache/setuptools-34.3.2.zip
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "setuptools/__init__.py", line 12, in <module>
import setuptools.version
File "setuptools/version.py", line 1, in <module>
import pkg_resources
File "pkg_resources/__init__.py", line 70, in <module>
import packaging.version
ImportError: No module named packaging.version
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in
/private/var/folders/94/wngs1jw91_n2_jjjrfljtqrc0000gn/T/pip-build-tplMt1/setuptools/
Traceback (most recent call last):
File
"/usr/local/Cellar/python/2.7.12_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py",
line 174, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File
"/usr/local/Cellar/python/2.7.12_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py",
line 72, in _run_code
exec code in run_globals
File "/.../dataflow/popular_people.py", line 255, in <module>
run()
File "/.../dataflow/popular_people.py", line 252, in run
read_from_datastore('dancedeets-hrd', gcloud_options)
File "/.../dataflow/popular_people.py", line 243, in read_from_datastore
result = p.run()
File "/.../dataflow/lib/apache_beam/pipeline.py", line 163, in run
return self.runner.run(self)
File "/.../dataflow/lib/apache_beam/runners/dataflow/dataflow_runner.py",
line 175, in run
self.dataflow_client.create_job(self.job), self)
File "/.../dataflow/lib/apache_beam/utils/retry.py", line 174, in wrapper
return fun(*args, **kwargs)
File "/.../dataflow/lib/apache_beam/runners/dataflow/internal/apiclient.py",
line 411, in create_job
self.create_job_description(job)
File "/.../dataflow/lib/apache_beam/runners/dataflow/internal/apiclient.py",
line 432, in create_job_description
job.options, file_copy=self._gcs_file_copy)
File "/.../dataflow/lib/apache_beam/runners/dataflow/internal/dependency.py",
line 290, in stage_job_resources
setup_options.requirements_file, requirements_cache_path)
File "/.../dataflow/lib/apache_beam/runners/dataflow/internal/dependency.py",
line 226, in _populate_requirements_cache
processes.check_call(cmd_args)
File "/.../dataflow/lib/apache_beam/utils/processes.py", line 40, in
check_call
return subprocess.check_call(*args, **kwargs)
File
"/usr/local/Cellar/python/2.7.12_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py",
line 541, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/usr/local/opt/python/bin/python2.7',
'-m', 'pip', 'install', '--download',
'/var/folders/94/wngs1jw91_n2_jjjrfljtqrc0000gn/T/dataflow-requirements-cache',
'-r', 'requirements.txt', '--no-binary', ':all:']' returned non-zero exit
status 1
{noformat}
And I'm running the latest pip.
{noformat}
$ pip --version
pip 9.0.1 from /usr/local/lib/python2.7/site-packages (python 2.7)
{noformat}
or more specifically, using the above command:
{noformat}
$ /usr/local/opt/python/bin/python2.7
Python 2.7.12 (default, Oct 10 2016, 02:02:45)
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.38)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pip
>>> pip.__version__
'9.0.1'
{noformat}
> Failure to build --requirements.txt when it uses google protobuf
> ----------------------------------------------------------------
>
> Key: BEAM-1790
> URL: https://issues.apache.org/jira/browse/BEAM-1790
> Project: Beam
> Issue Type: Bug
> Components: sdk-py
> Reporter: Mike Lambert
> Assignee: Ahmet Altay
> Labels: build, requirements
>
> I am running with {{--requirements_file requirements.txt}}, which contains:
> {noformat}
> google-cloud-datastore
> {noformat}
> Unfortunately, when attempting to run this on the cloud dataflow, I get the
> following error trying to build the requirements:
> {noformat}
> Collecting setuptools (from
> protobuf>=3.0.0->google-cloud-core<0.24dev,>=0.23.1->google-cloud-datastore->-r
> requirements.txt (line 3))
> File was already downloaded
> /var/folders/94/wngs1jw91_n2_jjjrfljtqrc0000gn/T/dataflow-requirements-cache/setuptools-34.3.2.zip
> Complete output from command python setup.py egg_info:
> Traceback (most recent call last):
> File "<string>", line 1, in <module>
> File "setuptools/__init__.py", line 12, in <module>
> import setuptools.version
> File "setuptools/version.py", line 1, in <module>
> import pkg_resources
> File "pkg_resources/__init__.py", line 70, in <module>
> import packaging.version
> ImportError: No module named packaging.version
> {noformat}
> Looking online https://github.com/pypa/setuptools/issues/937 , it appears
> this is due to "pip asking setuptools to build itself (from source dist),
> which is no longer supported."
> I'm not sure what the correct fix is here...since protobuf depends on
> setuptools, and a lot of Google libraries depend on protobuf. Seems there is
> no way to whitelist protobuf/setuptools as being "provided" by the beam
> runtime (ie https://github.com/pypa/pip/issues/3090).
> I'm going to try using my own setup.py next and see if I can skirt around the
> issue, but this definitely seems like a bug with beam's requirements packager
> asking for too much?
> In the case of GCE, I compile my dependencies into a docker image that
> extends the base GCE images (and lets me use binary installs), not sure
> something like that would work here?
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)