[ 
https://issues.apache.org/jira/browse/BEAM-11275?focusedWorklogId=660686&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-660686
 ]

ASF GitHub Bot logged work on BEAM-11275:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 06/Oct/21 01:47
            Start Date: 06/Oct/21 01:47
    Worklog Time Spent: 10m 
      Work Description: aaltay commented on pull request #15105:
URL: https://github.com/apache/beam/pull/15105#issuecomment-935235886


   > I updated the PR and also tested the code with the following command 
(private info redacted):
   > 
   > ```
   > python3 sdks/python/apache_beam/examples/wordcount.py --input 
gs://dataflow-samples/shakespeare/kinglear.txt --output 
gs://<scratch-bucket>/counts --runner DataflowRunner --project <project> 
--region <region> --temp_location gs://<scratch-bucket>/tmp_beam 
--extra_package="gs://<gcs-bucket>/extra.whl" --sdk_location=container 
--no_use_public_ips --service_account_email=email.com --network=network 
--subnetwork=https://www.googleapis.com/compute/v1/projects/... 
--experiment=shuffle_mode=service
   > ```
   > 
   > The job launched successfully with the following logs:
   > 
   > ```
   > INFO:apache_beam.runners.portability.stager:Downloading extra package: 
gs://beam-dataflow-it/wheel/tfx_twitter.whl locally before staging
   > INFO:apache_beam.runners.portability.stager:Copied remote file from 
gs://beam-dataflow-it/wheel/tfx_twitter.whl to 
/var/folders/jl/3vwrt5kd6vg9vrhjpyy6b8dh0000gp/T/tmpm_bqr7q1/tmp9n_at3dp/tfx_twitter.whl.
   > ...
   > INFO:apache_beam.runners.dataflow.internal.apiclient:Completed GCS upload 
to 
gs://scratch-user.calvinl.dp.gcp.twttr.net/tmp_beam/beamapp-calvinl-1005095238-436424.1633427558.436689/tfx_twitter.whl
 in 0 seconds.
   > ```
   > 
   > However, the job did not finish due to an internal GCP issue that leads to 
`Error syncing pod ` on Dataflow. If you'd like to see a finished job, I can 
add the logs once we resolve that internally.
   
   I do not need to see the logs, as long you can confirm that you were able to 
run a successful job.
   
   "Error syncing pod" -> this could have different causes: (i) failing to find 
and download a container (ii) or the container code fails at startup (which 
might be caused by this change?)
   
   Let us know if we can help.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 660686)
    Time Spent: 13h 40m  (was: 13.5h)

> Support GCS files for extra_requirements argument in Python Beam portable 
> runners
> ---------------------------------------------------------------------------------
>
>                 Key: BEAM-11275
>                 URL: https://issues.apache.org/jira/browse/BEAM-11275
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-py-core
>            Reporter: Gerard Casas Saez
>            Assignee: Calvin Leung
>            Priority: P2
>          Time Spent: 13h 40m
>  Remaining Estimate: 0h
>
> Currently Portable runners only support locally available files for adding 
> dependencies on remote workers. This can be seen in 
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/stager.py#L429
>  as it uses shutil.copyfile when it detects file is remote and it is not http.
> An easy extension would be to extend _is_remote_path in Stager to detect if 
> the path matches any filesystem and if it does the avoid downloading and let 
> it be copied afterwards. 
> Acceptance criteria:
> - `extra_package` can be a GCS path instead of requiring it to be local only.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to