aaltay commented on a change in pull request #12088: URL: https://github.com/apache/beam/pull/12088#discussion_r447358864
########## File path: sdks/python/apache_beam/transforms/util.py ########## @@ -741,6 +741,7 @@ def WithKeys(pcoll, k): @experimental() @typehints.with_input_types(Tuple[K, V]) [email protected]_output_types(Tuple[K, List[V]]) Review comment: I think I understand the difference between Python and Java. In Java, this works by emitting (Key, Iterable of Values) (here: https://github.com/apache/beam/blob/51bd3a4588984c9680f61a3268513fe5bde3ba96/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/GroupIntoBatches.java#L212) In Python, it stores (K, V) pairs elements (here: https://github.com/apache/beam/blob/51bd3a4588984c9680f61a3268513fe5bde3ba96/sdks/python/apache_beam/transforms/util.py#L784) and emits them as a list (here: https://github.com/apache/beam/blob/51bd3a4588984c9680f61a3268513fe5bde3ba96/sdks/python/apache_beam/transforms/util.py#L800) result in a List[Tuple[Key, Value]] If my understanding is correct, I can change the Python behavior to be similar to the Java one. And I can also add the space sharing optimization. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
