chamikaramj commented on pull request #14811:
URL: https://github.com/apache/beam/pull/14811#issuecomment-894440199


   I think this PR in it's current form does not add much value (and even could 
be a regression) since it pushes initial splitting into dynamic splitting.
   
   You can avoid the regression by using the "splitRestriction" function to 
perform initial splitting into partitions: 
https://beam.apache.org/documentation/programming-guide/#sdf-basics
   
   Even better if we can add a single SDF that combines "GeneratePartitionsFn" 
and "ReadFromPartitionFn" where logic of "GeneratePartitionsFn" is pushed into 
"splitRestriction". Another optimization might not be to not split all the way 
during within "splitRestriction" but split into a set of "partition groups" and 
then further split these partition groups during dynamic splitting if needed. 
I'm not sure what would be a desirable grouping size though. @nielm might have 
a better idea on that. This will help us prevent a large number of empty shards 
due to empty partitions, which I believe is an issue today.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to