Lasse Karls created BEAM-14165:
----------------------------------

             Summary: Specify GCS Object Version in apache_beam.io.gcp.gcsio
                 Key: BEAM-14165
                 URL: https://issues.apache.org/jira/browse/BEAM-14165
             Project: Beam
          Issue Type: Improvement
          Components: io-py-gcp
    Affects Versions: 2.37.0
            Reporter: Lasse Karls


I would like to specify a generation when accessing a gcs object via the beam 
filesystem.
Via the cli with the gsutil command a specific version can be access by the 
following syntax. 

{code:sh}
gsutil cp gs://{bucket}/{object_path}#{generation} .
{code}

So the corresponding python code would look something like this
{code:python}
with 
apache_beam.io.filesystems.open("gs://{bucket}/{object_path}#{generation}") as 
f:
pass
{code}

Fortunately, the 
[StorageObjectsGetRequest|https://github.com/apache/beam/blob/14862ccbdf2879574b6ce49149bdd7c9bf197322/sdks/python/apache_beam/io/gcp/internal/clients/storage/storage_v1_messages.py#L2133]
 can already be passed a generation. 
However, this is +*not done*+ within the 
[GcsDownloader|https://github.com/apache/beam/blob/14862ccbdf2879574b6ce49149bdd7c9bf197322/sdks/python/apache_beam/io/gcp/gcsio.py#L611].
 

I think when [parsing the GCS 
path|https://github.com/apache/beam/blob/14862ccbdf2879574b6ce49149bdd7c9bf197322/sdks/python/apache_beam/io/gcp/gcsio.py#L583]
 the generation should be extracted as well. 







--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to