jihoonson commented on issue #7048: Make IngestSegmentFirehoseFactory splittable for parallel ingestion URL: https://github.com/apache/incubator-druid/pull/7048#issuecomment-462515423 @glasser thanks for working on this! I have one comment. As in other distributed systems like Hadoop or Spark, it's important to evenly distributed work in parallel indexing because the entire ingestion is completed when the longest task is finished. It means, each sub task is supposed to process almost same amount of data. In this PR, it looks that input segments are split based on `taskGranularity`. But, in practice, it's possible that each time chunk has different amounts of data. What do you think about splitting based on the segment size, so that each task process a similar size of segments?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
