Abacn commented on issue #20137: URL: https://github.com/apache/beam/issues/20137#issuecomment-1324331946
Another duplicated listing blob operation happens here https://github.com/apache/beam/blob/9da27671cdc8b3df2c548d92a4b2e34f5e0aaa0f/sdks/python/apache_beam/io/filebasedsource.py#L144 and https://github.com/apache/beam/blob/9da27671cdc8b3df2c548d92a4b2e34f5e0aaa0f/sdks/python/apache_beam/io/filebasedsource.py#L202 For FileBasedSource, get_range_tracker first calls _get_concat_source which will fetch file list once. Then estimate_size will do another fetch. (If validate is set to True, there is even one more fetch). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
