glasser commented on issue #7048: Make IngestSegmentFirehoseFactory splittable for parallel ingestion URL: https://github.com/apache/incubator-druid/pull/7048#issuecomment-464312575 > So I think the algorithm would be something like: list the segments for the whole interval as a timeline. Select the first segment, and take the set of all segments that overlap it, transitively. If this set of segments has more than one interval, then all of those segments are constrained to go in the same subtask. Otherwise, each of the segments in this set (all of which are for the same interval) may go in their own subtask. OK, I think this was overcomplicating it. An easier way to solve the overlapping segment problem is just to make the task specify directly to the subtask a `List<WindowedSegment>`, where a WindowedSegment is a class that holds a segment ID (String) and a `List<Interval>`. This can be read off directly from the full timeline when calculating splits. The sub-task won't need to recalculate a timeline at all: it'll just set up some WindowedStorageAdapters to read the appropriate parts of the segments.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
