[
https://issues.apache.org/jira/browse/BEAM-11996?focusedWorklogId=635435&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-635435
]
ASF GitHub Bot logged work on BEAM-11996:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 06/Aug/21 18:25
Start Date: 06/Aug/21 18:25
Worklog Time Spent: 10m
Work Description: chamikaramj commented on pull request #14811:
URL: https://github.com/apache/beam/pull/14811#issuecomment-894440199
I think this PR in it's current form does not add much value (and even could
be a regression) since it pushes initial splitting into dynamic splitting.
You can avoid the regression by using the "splitRestriction" function to
perform initial splitting into partitions:
https://beam.apache.org/documentation/programming-guide/#sdf-basics
Even better if we can add a single SDF that combines "GeneratePartitionsFn"
and "ReadFromPartitionFn" where logic of "GeneratePartitionsFn" is pushed into
"splitRestriction". Another optimization might not be to not split all the way
during within "splitRestriction" but split into a set of "partition groups" and
then further split these partition groups during dynamic splitting if needed.
I'm not sure what would be a desirable grouping size though. @nielm might have
a better idea on that. This will help us prevent a large number of empty shards
due to empty partitions, which I believe is an issue today.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 635435)
Time Spent: 5h 50m (was: 5h 40m)
> Implement SpannerIO on top of Splittable DoFn
> ---------------------------------------------
>
> Key: BEAM-11996
> URL: https://issues.apache.org/jira/browse/BEAM-11996
> Project: Beam
> Issue Type: Improvement
> Components: io-java-gcp
> Reporter: Boyuan Zhang
> Assignee: Miguel Anzo
> Priority: P2
> Time Spent: 5h 50m
> Remaining Estimate: 0h
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)