jihoonson commented on issue #7048: Make IngestSegmentFirehoseFactory 
splittable for parallel ingestion
URL: https://github.com/apache/incubator-druid/pull/7048#issuecomment-462932803
 
 
   Is your concern that different sub tasks can process overlapping segments of 
different versions, so that both of them produce data of overlapping intervals 
which can cause the incorrect result? I think it's valid. 
   
   To avoid this, we can add another restriction in split generation: each 
split should contain all overlapping segments. For example, suppose we have 3 
segments of v1 @ 1:00-3:00, v2 @ 2:00-4:00, and v2 @ 4:00-6:00. Then, we need 2 
sub tasks each of which processes (v1 @ 1:00-3:00 and v2 @ 2:00-4:00) and (v2 @ 
4:00-6:00), respectively. Each sub tasks should be able to generate the valid 
timeline from the given subset of sub tasks.
   
   I think this restriction is acceptable because this kind of overlapping 
won't be common. What do you think?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to