kennknowles commented on code in PR #33384: URL: https://github.com/apache/beam/pull/33384#discussion_r1890706630
########## sdks/python/apache_beam/io/gcp/gcsfilesystem.py: ########## @@ -377,3 +377,17 @@ def report_lineage(self, path, lineage, level=None): # bucket only components = components[:-1] lineage.add('gcs', *components) + + def check_splittability(self, path): + try: + file_metadata = self._gcsIO()._status(path) + if file_metadata.get('content_encoding', None) == 'gzip': Review Comment: Doesn't the content-type also have to be a particular thing in addition to the content-encoding being set to gzip? ########## sdks/python/apache_beam/io/filebasedsource.py: ########## @@ -259,7 +259,15 @@ def splittable(self): return self._splittable +def _is_decompressive_transcoding_enabled(file_path): + + return True Review Comment: ? (am I parsing this right? it seems like a function definition at the top level but with a leading underscore and the body of the function is a stub) ########## sdks/python/apache_beam/io/filesystem.py: ########## @@ -945,3 +945,6 @@ def report_lineage(self, path, unused_lineage, level=None): Unless override by FileSystem implementations, default to no-op. """ pass + + def check_splittability(self, path): + return True Review Comment: This should probably not always be true. If this is a default, perhaps it should not have a default but be abstract and we implement for various filesystems. If it is the default, comment so we understand that is why it ignores the argument. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org