lostluck commented on issue #32498:
URL: https://github.com/apache/beam/issues/32498#issuecomment-2364528431

   OK, definitely works well for me, but I am also on Google's network, in 
Seattle. 
   It certainly must be made to work smoothly for folks who *aren't* in my 
specific unlikely situation.
   
   Adding a bit more debugging tells me the following: 
   
   * (~200ms) Time to list the files from the service. Since this transform 
doesn't split, it isn't affected by the current policy. Actual file 
reading/opening are in a different bundle.
   * (~200us) Time from Start bundle to get to ProcessElement. Negligible.
   * (~100ms) Time to actually open the file for reading.
   
   The current Default Split policy for Prism is to only ask for progress and 
similar every ~100ms, and if there has been *any* progress either by the 
channel counter, or downstream element emissions, then it *will not split*. 
This allows it to split when processing is slow (indicated by ~100-200ms where 
the counts have not moved).
   
   Setting the progress ticker to ~ 10ms gives me similar behavior as the 
reports (Which gives me the chance to find something that should work.)
   
   The split planning is so simple, it's not taking into account other work 
that has been previously done. So it's always only waiting a fixed interval for 
work for a given stage. 
   
   A more robust view would take into account work "globally" on the job, and 
only split if a stage is "straggling" or similar, but prism shouldn't go that 
far at this time. And we don't want to slow down *all* stages just because one 
needs to be more conservative in how it splits.
   
   I'm now trying out adding a "back off", for a given stage. If a split needs 
to happen, the rate of progress requests (and split decisions) happens slower 
for all new stages. If stages finish faster than any progress requests, then 
they are made to go faster again. So this should even out to some "ideal" rate 
per stage. But for this issue, a few "quick" splits should happen and then the 
aggression is toned down enough for work to complete properly.
   
   This isn't likely to be the final dynamic splitting decision approach, since 
it would be best for that to be also tied to the rate of input to output and 
similar. Combined with a better initial splits of data would probably solve 
most problems.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to