glasser commented on issue #7048: Make IngestSegmentFirehoseFactory splittable 
for parallel ingestion
URL: https://github.com/apache/incubator-druid/pull/7048#issuecomment-464312575
 
 
   > So I think the algorithm would be something like: list the segments for 
the whole interval as a timeline. Select the first segment, and take the set of 
all segments that overlap it, transitively. If this set of segments has more 
than one interval, then all of those segments are constrained to go in the same 
subtask. Otherwise, each of the segments in this set (all of which are for the 
same interval) may go in their own subtask.
   
   OK, I think this was overcomplicating it. An easier way to solve the 
overlapping segment problem is just to make the task specify directly to the 
subtask a `List<WindowedSegment>`, where a WindowedSegment is a class that 
holds a segment ID (String) and a `List<Interval>`.  This can be read off 
directly from the full timeline when calculating splits.
   
   The sub-task won't need to recalculate a timeline at all: it'll just set up 
some WindowedStorageAdapters to read the appropriate parts of the segments.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to