glasser commented on issue #7048: Make IngestSegmentFirehoseFactory splittable for parallel ingestion URL: https://github.com/apache/incubator-druid/pull/7048#issuecomment-462904598 But doesn't that holder come from the timeline, so it only works if the timeline was constructed from the full set of segments? I think what this means is that `IngestSegmentFirehoseFactory.connect` needs to *always* call SegmentListAction/VersionedIntervalTimeline on the *full* original interval even in a subtask. However it will only call fetchSegments on the selected segments, and it should skip elements of timelineSegments that aren't in the selected segments. This implies that instead of `segments` being an alternate option for `interval`, `interval` always needs to be specified. Alternatively, the split operation needs to provide both the full list of segments and the split-specific segment list to each split firehose factory. (Or at least include an extra list of overlapping segments.) (In other news, my TaskToolboxConsumingFirehoseFactory idea runs into trouble because CombiningFirehoseFactory is in druid-server which doesn't have access to druid-indexing-service's TaskToolbox type. I suppose the interface could declare `setTaskToolbox(Object)` and let IngestSegmentFirehoseFactory do a typecast?)
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
