Abacn commented on code in PR #17604:
URL: https://github.com/apache/beam/pull/17604#discussion_r881976544


##########
sdks/python/apache_beam/io/filebasedsource.py:
##########
@@ -338,15 +338,25 @@ def default_output_coder(self):
 
 class _ExpandIntoRanges(DoFn):
   def __init__(
-      self, splittable, compression_type, desired_bundle_size, 
min_bundle_size):
+      self,
+      splittable,
+      compression_type,
+      desired_bundle_size,
+      min_bundle_size,
+      do_match=True):
     self._desired_bundle_size = desired_bundle_size
     self._min_bundle_size = min_bundle_size
     self._splittable = splittable
     self._compression_type = compression_type
+    self._do_match = do_match
 
   def process(self, element, *args, **kwargs):
-    match_results = FileSystems.match([element])
-    for metadata in match_results[0].metadata_list:
+    if self._do_match:

Review Comment:
   Here the passed in parameters can be either filename or FileMetadata. If 
FileMetadata is passed in then need not to do a match again. 
   
   Should be able to eliminate the _do_match flag and simply check with 
isinstance(...) at run time. Also should add type hint for process method.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to