[GitHub] josephglanville commented on issue #5492: Native parallel batch indexing without shuffle

GitBox Sat, 04 Aug 2018 23:07:27 -0700

josephglanville commented on issue #5492: Native parallel batch indexing 
without shuffle
URL: https://github.com/apache/incubator-druid/pull/5492#issuecomment-410498288
 
 
   @jihoonson if I understand the semantics correctly if you want to create 
segments with perfect rollup you can return input splits that map all of the 
data for each output segment to only one subtask per segment?
   ie. have getSplits return `Steam<InputSplit<List<SplitType>>>` and withSplit 
take `InputSplit<List<SplitType>`. Where getSplits returns splits that are 
partitioned by output segment intervals.
   
   My goal with this line of thinking is to alleviate the need for the 
merging/shuffle phase. As long as the number of files you need to read per 
segment isn't too large for a single subtask this seems like a reasonable 
approach?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] josephglanville commented on issue #5492: Native parallel batch indexing without shuffle

Reply via email to