wilmerdooley commented on code in PR #38751:
URL: https://github.com/apache/beam/pull/38751#discussion_r3328957296
##########
sdks/python/apache_beam/io/gcp/gcsfilesystem.py:
##########
@@ -43,13 +52,32 @@ class GCSFileSystem(FileSystem):
"""A GCS ``FileSystem`` implementation for accessing files on GCS.
"""
- CHUNK_SIZE = gcsio.MAX_BATCH_OPERATION_SIZE # Chuck size in batch operations
GCS_PREFIX = 'gs://'
def __init__(self, pipeline_options):
super().__init__(pipeline_options)
self._pipeline_options = pipeline_options
+ @staticmethod
+ def _get_gcsio_module():
+ """Return the ``gcsio`` module, raising ImportError if it is unavailable.
+
+ ``gcsio`` is imported lazily (see the module-level import) so that this
+ filesystem can be looked up without the gcp extra installed. The dependency
+ is only required when the filesystem is actually used.
+ """
+ if gcsio is None:
+ raise ImportError(
+ 'Could not import apache_beam.io.gcp.gcsio. This usually means the '
+ 'gcp dependencies are not installed. Install them with: '
+ 'pip install apache-beam[gcp]')
+ return gcsio
+
+ @property
+ def CHUNK_SIZE(self):
+ """Chunk size in batch operations."""
+ return self._get_gcsio_module().MAX_BATCH_OPERATION_SIZE
Review Comment:
Thanks, good catch. Fixed in the latest push: CHUNK_SIZE is now exposed via
a small class-property descriptor (the pattern you suggested), so
GCSFileSystem.CHUNK_SIZE resolves at both the class and instance level,
matching S3FileSystem's class attribute, while staying lazy. I also added a
test (test_chunk_size_on_class_and_instance) covering both the class and
instance access paths.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]