Copilot commented on code in PR #37488:
URL: https://github.com/apache/beam/pull/37488#discussion_r2762803363
##########
sdks/python/apache_beam/io/gcp/gcsfilesystem.py:
##########
@@ -34,16 +34,30 @@
from apache_beam.io.filesystem import CompressionTypes
from apache_beam.io.filesystem import FileMetadata
from apache_beam.io.filesystem import FileSystem
-from apache_beam.io.gcp import gcsio
__all__ = ['GCSFileSystem']
class GCSFileSystem(FileSystem):
"""A GCS ``FileSystem`` implementation for accessing files on GCS.
"""
+ def _get_gcsio(self):
+ try:
+ from apache_beam.io.gcp import gcsio
+ return gcsio
+ except ImportError:
+ raise ImportError(
+ 'GCSFileSystem requires apache-beam[gcp]. '
+ 'Install it with: pip install apache-beam[gcp]')
+
+ _CHUNK_SIZE = None
+
+ @property
+ def CHUNK_SIZE(self):
+ if self._CHUNK_SIZE is None:
+ self._CHUNK_SIZE = self._get_gcsio().MAX_BATCH_OPERATION_SIZE
+ return self._CHUNK_SIZE # Chuck size in batch operations
Review Comment:
Typo in comment: "Chuck size" should be "Chunk size".
```suggestion
return self._CHUNK_SIZE # Chunk size in batch operations
```
##########
sdks/python/apache_beam/io/gcp/gcsfilesystem.py:
##########
@@ -34,16 +34,30 @@
from apache_beam.io.filesystem import CompressionTypes
from apache_beam.io.filesystem import FileMetadata
from apache_beam.io.filesystem import FileSystem
-from apache_beam.io.gcp import gcsio
__all__ = ['GCSFileSystem']
class GCSFileSystem(FileSystem):
"""A GCS ``FileSystem`` implementation for accessing files on GCS.
"""
+ def _get_gcsio(self):
+ try:
+ from apache_beam.io.gcp import gcsio
+ return gcsio
+ except ImportError:
+ raise ImportError(
+ 'GCSFileSystem requires apache-beam[gcp]. '
+ 'Install it with: pip install apache-beam[gcp]')
+
+ _CHUNK_SIZE = None
+
+ @property
+ def CHUNK_SIZE(self):
+ if self._CHUNK_SIZE is None:
+ self._CHUNK_SIZE = self._get_gcsio().MAX_BATCH_OPERATION_SIZE
+ return self._CHUNK_SIZE # Chuck size in batch operations
Review Comment:
`_CHUNK_SIZE` is defined as a class variable (shared across all instances),
but it's being used as an instance variable in the property getter. This
creates a subtle bug where if one instance initializes `_CHUNK_SIZE`, all other
instances will see it as initialized, even if they haven't accessed the gcsio
module yet. Consider making this a true instance variable by initializing it in
`__init__` as `self._chunk_size = None` and updating the property accordingly.
##########
examples/java/src/main/java/org/apache/beam/examples/cookbook/TriggerExample.java:
##########
@@ -486,7 +486,10 @@ static void runTriggerExample(TrafficFlowOptions options)
throws IOException {
.get(i)
.apply(
"BigQuery Write Rows" + i,
-
BigQueryIO.writeTableRows().to(tableRef).withSchema(getSchema()));
+ BigQueryIO.writeTableRows()
+ .to(tableRef)
+ .withSchema(getSchema())
+
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND));
Review Comment:
This Java file change appears unrelated to the Python GCS filesystem lazy
loading feature described in the PR. The change adds a write disposition to a
BigQuery write operation in a Java example, which has no connection to making
GCS filesystem lookup lazy in Python. This should likely be in a separate PR.
##########
sdks/python/apache_beam/io/gcp/gcsfilesystem_test.py:
##########
@@ -38,6 +39,12 @@
# pylint: enable=wrong-import-order, wrong-import-position
+class GCSFileSystemLazyLoadTest(unittest.TestCase):
+ def test_get_filesystem_does_not_require_gcp_extra(self):
+ fs = FileSystems.get_filesystem('gs://test-bucket/path')
+ self.assertEqual(fs.scheme(), 'gs')
Review Comment:
The test validates that `get_filesystem()` succeeds without GCP extras, but
it doesn't verify the second part of the lazy loading behavior: that actual
filesystem operations (like `open()`, `match()`, etc.) properly raise an
ImportError when GCP dependencies are missing. Consider adding a test that
verifies an ImportError is raised when attempting to use the filesystem without
GCP dependencies installed, to ensure the lazy loading works as intended.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]