shunping commented on issue #30166: URL: https://github.com/apache/beam/issues/30166#issuecomment-1921515109
I checked the code in Beam 2.52 and 2.53. Here is what I found out. - In both versions, the function of `list_prefix()` returns a **dictionary** that maps a file name to its metadata: https://github.com/apache/beam/blob/release-2.52.0/sdks/python/apache_beam/io/gcp/gcsio.py#L586 and https://github.com/apache/beam/blob/release-2.53.0/sdks/python/apache_beam/io/gcp/gcsio.py#L437 - In both versions, the function of `delete_batch()` takes a **list** of file patterns/names as its input: https://github.com/apache/beam/blob/release-2.52.0/sdks/python/apache_beam/io/gcp/gcsio.py#L291 and https://github.com/apache/beam/blob/release-2.53.0/sdks/python/apache_beam/io/gcp/gcsio.py#L206 In other words, both function signatures haven't been modified. I think the way it worked before is not the right way to call `delete_batch`. It happened to work in 2.52 because of its internal implementation: calling `iter()` on a dictionary returns the key iterator (https://github.com/apache/beam/blob/release-2.52.0/sdks/python/apache_beam/io/gcp/gcsio.py#L301), and then we get a list of keys from there (https://github.com/apache/beam/blob/release-2.52.0/sdks/python/apache_beam/io/gcp/gcsio.py#L304). It doesn't matter whether we use `itertools.islice` or not. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
