chamikaramj commented on pull request #14811: URL: https://github.com/apache/beam/pull/14811#issuecomment-894440199
I think this PR in it's current form does not add much value (and even could be a regression) since it pushes initial splitting into dynamic splitting. You can avoid the regression by using the "splitRestriction" function to perform initial splitting into partitions: https://beam.apache.org/documentation/programming-guide/#sdf-basics Even better if we can add a single SDF that combines "GeneratePartitionsFn" and "ReadFromPartitionFn" where logic of "GeneratePartitionsFn" is pushed into "splitRestriction". Another optimization might not be to not split all the way during within "splitRestriction" but split into a set of "partition groups" and then further split these partition groups during dynamic splitting if needed. I'm not sure what would be a desirable grouping size though. @nielm might have a better idea on that. This will help us prevent a large number of empty shards due to empty partitions, which I believe is an issue today. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
