Copilot commented on code in PR #37488:
URL: https://github.com/apache/beam/pull/37488#discussion_r2762803363


##########
sdks/python/apache_beam/io/gcp/gcsfilesystem.py:
##########
@@ -34,16 +34,30 @@
 from apache_beam.io.filesystem import CompressionTypes
 from apache_beam.io.filesystem import FileMetadata
 from apache_beam.io.filesystem import FileSystem
-from apache_beam.io.gcp import gcsio
 
 __all__ = ['GCSFileSystem']
 
 
 class GCSFileSystem(FileSystem):
   """A GCS ``FileSystem`` implementation for accessing files on GCS.
   """
+  def _get_gcsio(self):
+    try:
+      from apache_beam.io.gcp import gcsio
+      return gcsio
+    except ImportError:
+      raise ImportError(
+          'GCSFileSystem requires apache-beam[gcp]. '
+          'Install it with: pip install apache-beam[gcp]')
+
+  _CHUNK_SIZE = None
+
+  @property
+  def CHUNK_SIZE(self):
+    if self._CHUNK_SIZE is None:
+      self._CHUNK_SIZE = self._get_gcsio().MAX_BATCH_OPERATION_SIZE
+    return self._CHUNK_SIZE  # Chuck size in batch operations

Review Comment:
   Typo in comment: "Chuck size" should be "Chunk size".
   ```suggestion
       return self._CHUNK_SIZE  # Chunk size in batch operations
   ```



##########
sdks/python/apache_beam/io/gcp/gcsfilesystem.py:
##########
@@ -34,16 +34,30 @@
 from apache_beam.io.filesystem import CompressionTypes
 from apache_beam.io.filesystem import FileMetadata
 from apache_beam.io.filesystem import FileSystem
-from apache_beam.io.gcp import gcsio
 
 __all__ = ['GCSFileSystem']
 
 
 class GCSFileSystem(FileSystem):
   """A GCS ``FileSystem`` implementation for accessing files on GCS.
   """
+  def _get_gcsio(self):
+    try:
+      from apache_beam.io.gcp import gcsio
+      return gcsio
+    except ImportError:
+      raise ImportError(
+          'GCSFileSystem requires apache-beam[gcp]. '
+          'Install it with: pip install apache-beam[gcp]')
+
+  _CHUNK_SIZE = None
+
+  @property
+  def CHUNK_SIZE(self):
+    if self._CHUNK_SIZE is None:
+      self._CHUNK_SIZE = self._get_gcsio().MAX_BATCH_OPERATION_SIZE
+    return self._CHUNK_SIZE  # Chuck size in batch operations

Review Comment:
   `_CHUNK_SIZE` is defined as a class variable (shared across all instances), 
but it's being used as an instance variable in the property getter. This 
creates a subtle bug where if one instance initializes `_CHUNK_SIZE`, all other 
instances will see it as initialized, even if they haven't accessed the gcsio 
module yet. Consider making this a true instance variable by initializing it in 
`__init__` as `self._chunk_size = None` and updating the property accordingly.



##########
examples/java/src/main/java/org/apache/beam/examples/cookbook/TriggerExample.java:
##########
@@ -486,7 +486,10 @@ static void runTriggerExample(TrafficFlowOptions options) 
throws IOException {
           .get(i)
           .apply(
               "BigQuery Write Rows" + i,
-              
BigQueryIO.writeTableRows().to(tableRef).withSchema(getSchema()));
+              BigQueryIO.writeTableRows()
+                  .to(tableRef)
+                  .withSchema(getSchema())
+                  
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND));

Review Comment:
   This Java file change appears unrelated to the Python GCS filesystem lazy 
loading feature described in the PR. The change adds a write disposition to a 
BigQuery write operation in a Java example, which has no connection to making 
GCS filesystem lookup lazy in Python. This should likely be in a separate PR.



##########
sdks/python/apache_beam/io/gcp/gcsfilesystem_test.py:
##########
@@ -38,6 +39,12 @@
 # pylint: enable=wrong-import-order, wrong-import-position
 
 
+class GCSFileSystemLazyLoadTest(unittest.TestCase):
+  def test_get_filesystem_does_not_require_gcp_extra(self):
+    fs = FileSystems.get_filesystem('gs://test-bucket/path')
+    self.assertEqual(fs.scheme(), 'gs')

Review Comment:
   The test validates that `get_filesystem()` succeeds without GCP extras, but 
it doesn't verify the second part of the lazy loading behavior: that actual 
filesystem operations (like `open()`, `match()`, etc.) properly raise an 
ImportError when GCP dependencies are missing. Consider adding a test that 
verifies an ImportError is raised when attempting to use the filesystem without 
GCP dependencies installed, to ensure the lazy loading works as intended.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to