Yi Hu created BEAM-14393:
----------------------------
Summary: Obtain metadata field at once in file system's IO
connectors
Key: BEAM-14393
URL: https://issues.apache.org/jira/browse/BEAM-14393
Project: Beam
Issue Type: Improvement
Components: io-py-common
Reporter: Yi Hu
Fix For: 3.0.0
This tasks involves refactoring and improvements of IO connectors' file
metadata related methods (GcsIO, S3IO, BlobIO, hadoop).
Currently, we have individual methods like size, last_updated, checksum, and
others. Each one would make a HTTP request in order to get the specific
metadata field. If one needs to gather multiple metadata fields, then every
specific method are called and making multiple requests under the hood.
Actually, the HTTP response contains multiple file metadata fields but each
time only one field is collected and others are discarded.
We should have a public method that returns a named tuple which contains
multiple file metadata fields. In its implementation it only makes one request,
as existing methods for single metadata field.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)