[
https://issues.apache.org/jira/browse/BEAM-11275?focusedWorklogId=632287&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-632287
]
ASF GitHub Bot logged work on BEAM-11275:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 02/Aug/21 11:51
Start Date: 02/Aug/21 11:51
Worklog Time Spent: 10m
Work Description: calvinleungyk commented on pull request #15105:
URL: https://github.com/apache/beam/pull/15105#issuecomment-890962522
Hi @ibzib, here's what I have so far:
I am not adding an ArtifactInformation of URL type in `stager.py` and am
writing the remote file paths to `EXTRA_PACKAGES_FILE = 'extra_packages.txt'`
in `sdks/python/apache_beam/runners/portability/stager.py`. This file is then
read in
[installExtraPackages](https://github.com/apache/beam/blob/dce846b36a4fb9140c4c5d14e10b72f835f03d98/sdks/python/container/piputil.go#L114)
and `pip` tries to install the package directly, which will fail on private
GCS bucket. If I generate an ArtifactInformation of URL type, the worker will
eventually run
[extractStagingToPath](https://github.com/apache/beam/blob/dce846b36a4fb9140c4c5d14e10b72f835f03d98/sdks/go/pkg/beam/artifact/materialize.go#L139)
on all ArtifactInformation and checks if the ArtifactInformation has a
`URNStagingTo` role or if the type is `URNFileArtifact`, and both evaluate to
`False` and the function will give an error.
I might be missing some place where the worker is using the artifact service
to download artifacts as I'm not familiar with the worker code. If the above is
inaccurate, would you be able to show me where the worker would attempt to
fetch a URL artifact?
As for integration tests, I am running into credential issues which prevents
the job from reaching Compute Engine Metadata server with error
`WARNING:google.auth.compute_engine._metadata:Compute Engine Metadata server
unavailable onattempt 1 of 3. Reason: timed out`,
```
WARNING:apache_beam.internal.gcp.auth:Unable to find default credentials to
use: The Application Default Credentials are not available. They are available
if running in Google Compute Engine. Otherwise, the environment variable
GOOGLE_APPLICATION_CREDENTIALS must be defined pointing to a file defining the
credentials. See
https://developers.google.com/accounts/docs/application-default-credentials for
more information.
Connecting anonymously.
...
Failed to start a local webserver listening on either port 8080
or port 8090. Please check your firewall settings and locally
```
The Gradle error is:
```
FAILURE: Build failed with an exception.
* What went wrong:
Gradle build daemon disappeared unexpectedly (it may have been killed or may
have crashed)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 632287)
Time Spent: 6.5h (was: 6h 20m)
> Support GCS files for extra_requirements argument in Python Beam portable
> runners
> ---------------------------------------------------------------------------------
>
> Key: BEAM-11275
> URL: https://issues.apache.org/jira/browse/BEAM-11275
> Project: Beam
> Issue Type: Improvement
> Components: sdk-py-core
> Reporter: Gerard Casas Saez
> Assignee: Calvin Leung
> Priority: P2
> Time Spent: 6.5h
> Remaining Estimate: 0h
>
> Currently Portable runners only support locally available files for adding
> dependencies on remote workers. This can be seen in
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/stager.py#L429
> as it uses shutil.copyfile when it detects file is remote and it is not http.
> An easy extension would be to extend _is_remote_path in Stager to detect if
> the path matches any filesystem and if it does the avoid downloading and let
> it be copied afterwards.
> Acceptance criteria:
> - `extra_package` can be a GCS path instead of requiring it to be local only.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)