jihoonson commented on issue #7048: Make IngestSegmentFirehoseFactory 
splittable for parallel ingestion
URL: https://github.com/apache/incubator-druid/pull/7048#issuecomment-462515423
 
 
   @glasser thanks for working on this! I have one comment.
   
   As in other distributed systems like Hadoop or Spark, it's important to 
evenly distributed work in parallel indexing because the entire ingestion is 
completed when the longest task is finished. It means, each sub task is 
supposed to process almost same amount of data.
   In this PR, it looks that input segments are split based on 
`taskGranularity`. But, in practice, it's possible that each time chunk has 
different amounts of data. What do you think about splitting based on the 
segment size, so that each task process a similar size of segments?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to