damccorm opened a new issue, #21214:
URL: https://github.com/apache/beam/issues/21214

   The situation here is that when a pipeline is run on a portable runner using 
a GCP IO, and uses docker for the SDK Harness environment, the SDK Harness does 
not have the user's GCP credentials available and the pipeline fails. There are 
apparently [pipeline options for setting 
credentials](https://github.com/apache/beam/blob/v2.33.0/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/options/GcpOptions.java#L170),
 but as far as I can tell they are either meant only for non-portable 
pipelines, or only for the Dataflow runner.
   
   The tricky part of implementing this is that credentials for GCP are not 
straightforward, and having them available for something like the Application 
Default Credentials API involves copying over multiple files or environment 
variables. The following article provides a lot of context for the difficulties 
involved: 
[https://medium.com/datamindedbe/application-default-credentials-477879e31cb5](https://medium.com/datamindedbe/application-default-credentials-477879e31cb5)
   
   Possible solutions. Note these are mostly untested:
    - Perform some volume-mounting when calling the "docker run" command to 
mount directories containing credentials. Preferably this can be set via some 
sort of pipeline option. (This could potentially also be used to provide 
directories for docker containers to write output files to with TextIO or 
FileIO.) See the article above for an example.
    ** This solution may not work with runners on remote endpoints though. The 
directory mounted must be on the same machine as the docker container to work 
properly, which may not be possible in some cases with remote runners.
    - Require custom containers with appropriate credentials provided. This is 
more robust than the solution above, but less user-friendly, and would require 
a good amount of documentation to be available.
    ** This could be possible in conjunction with the solution above, and might 
be a good way of supporting GCP credentials on remote runners. Custom 
containers can store any valid credentials of the user's choice, (for example 
service account credentials for a production service) and then be run on any 
machine.
   
   Imported from Jira 
[BEAM-13215](https://issues.apache.org/jira/browse/BEAM-13215). Original Jira 
may contain additional context.
   Reported by: danoliveira.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to