stankiewicz opened a new pull request, #33780: URL: https://github.com/apache/beam/pull/33780
For Dataflow V2, StateBackedIterable is iterated by readers after gbk shuffle. Examples are ParDo after GBK or merging combiners after GBK. [metrics.proto](https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/org/apache/beam/model/pipeline/v1/metrics.proto#L244) specifies that Sampling is used because calculating the byte count involves serializing the elements which is CPU intensive. In case of StateBackedIterable sampling is not occurring which impacts performance of some of the pipelines that have expensive coders. This change introduces sampling. Fully fixes #33620 as previous fix was improvement. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
