kennknowles commented on code in PR #33384:
URL: https://github.com/apache/beam/pull/33384#discussion_r1890706630


##########
sdks/python/apache_beam/io/gcp/gcsfilesystem.py:
##########
@@ -377,3 +377,17 @@ def report_lineage(self, path, lineage, level=None):
       # bucket only
       components = components[:-1]
     lineage.add('gcs', *components)
+
+  def check_splittability(self, path):
+    try:
+      file_metadata = self._gcsIO()._status(path)
+      if file_metadata.get('content_encoding', None) == 'gzip':

Review Comment:
   Doesn't the content-type also have to be a particular thing in addition to 
the content-encoding being set to gzip?



##########
sdks/python/apache_beam/io/filebasedsource.py:
##########
@@ -259,7 +259,15 @@ def splittable(self):
     return self._splittable
 
 
+def _is_decompressive_transcoding_enabled(file_path):
+
+  return True

Review Comment:
   ?
   
   (am I parsing this right? it seems like a function definition at the top 
level but with a leading underscore and the body of the function is a stub)



##########
sdks/python/apache_beam/io/filesystem.py:
##########
@@ -945,3 +945,6 @@ def report_lineage(self, path, unused_lineage, level=None):
     Unless override by FileSystem implementations, default to no-op.
     """
     pass
+
+  def check_splittability(self, path):
+    return True

Review Comment:
   This should probably not always be true. If this is a default, perhaps it 
should not have a default but be abstract and we implement for various 
filesystems. If it is the default, comment so we understand that is why it 
ignores the argument.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to