jihoonson commented on issue #7048: Make IngestSegmentFirehoseFactory splittable for parallel ingestion URL: https://github.com/apache/incubator-druid/pull/7048#issuecomment-462932803 Is your concern that different sub tasks can process overlapping segments of different versions, so that both of them produce data of overlapping intervals which can cause the incorrect result? I think it's valid. To avoid this, we can add another restriction in split generation: each split should contain all overlapping segments. For example, suppose we have 3 segments of v1 @ 1:00-3:00, v2 @ 2:00-4:00, and v2 @ 4:00-6:00. Then, we need 2 sub tasks each of which processes (v1 @ 1:00-3:00 and v2 @ 2:00-4:00) and (v2 @ 4:00-6:00), respectively. Each sub tasks should be able to generate the valid timeline from the given subset of sub tasks. I think this restriction is acceptable because this kind of overlapping won't be common. What do you think?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
