Lasse Karls created BEAM-14165:
----------------------------------
Summary: Specify GCS Object Version in apache_beam.io.gcp.gcsio
Key: BEAM-14165
URL: https://issues.apache.org/jira/browse/BEAM-14165
Project: Beam
Issue Type: Improvement
Components: io-py-gcp
Affects Versions: 2.37.0
Reporter: Lasse Karls
I would like to specify a generation when accessing a gcs object via the beam
filesystem.
Via the cli with the gsutil command a specific version can be access by the
following syntax.
{code:sh}
gsutil cp gs://{bucket}/{object_path}#{generation} .
{code}
So the corresponding python code would look something like this
{code:python}
with
apache_beam.io.filesystems.open("gs://{bucket}/{object_path}#{generation}") as
f:
pass
{code}
Fortunately, the
[StorageObjectsGetRequest|https://github.com/apache/beam/blob/14862ccbdf2879574b6ce49149bdd7c9bf197322/sdks/python/apache_beam/io/gcp/internal/clients/storage/storage_v1_messages.py#L2133]
can already be passed a generation.
However, this is +*not done*+ within the
[GcsDownloader|https://github.com/apache/beam/blob/14862ccbdf2879574b6ce49149bdd7c9bf197322/sdks/python/apache_beam/io/gcp/gcsio.py#L611].
I think when [parsing the GCS
path|https://github.com/apache/beam/blob/14862ccbdf2879574b6ce49149bdd7c9bf197322/sdks/python/apache_beam/io/gcp/gcsio.py#L583]
the generation should be extracted as well.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)