arunpandianp opened a new pull request, #33318: URL: https://github.com/apache/beam/pull/33318
This is a POC showing how state multiplexing can work for GroupByKey. - Messages with small keys (<4K) are hashed and shuffled to a fixed set of virtual sharding keys and get reduced on the virtual keys. - The actual keys are sent to the virtual keys as part of windows. - Existing combining, aggregations work at the window level, so they all work out of the box. The 4k threshold is currently an arbitrary small value and can be tweaked. The constraint is any state tags should not exceed 64k and keys are now part of the windowed state tags. Need to cleanup comments and add tests, sending this to share the idea and get initial feedback. R: @scwhittle -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
