The thread a couple days ago about CoGroupByKey being possibly broken in beam 2.36.0 [1] had an interesting thing in it that had me thinking. The CoGbkResultCoder doesn't override getEncodedElementByteSize (nor does it know how to actually compute the size in any efficient way), so sampling a CoGbkResult element size will require iterating over the entire result. This can be incredibly expensive, given that CoGbkResult objects can have an arbitrarily large number of elements (even more than that could fit into memory) and will require paging them in from shuffle. These need to be all encoded, even if they never would have been to begin with (given that consumer of the CoGbkResult will fuse to the operation itself).
Making CoGbkResultCoder.getEncodedElementByteSize simply return 0 provided a significant performance boost in a few of the jobs I tested it with, and in fact fixed some jobs that had been failing due to OOM issues. The downside is obviously that this estimate will be incorrect now in the dataflow UI, but given that things like GBK are also incorrect, do we particularly care? Curious what the community thinks here, imo the benefit far outweighs the negative. [1] https://lists.apache.org/thread/5y56kbgm3q0m1byzf7186rrkomrcfldm
