jihoonson commented on issue #9463: Add namespaces to Druid segments within a 
data source
URL: https://github.com/apache/druid/issues/9463#issuecomment-598498946
 
 
   @JulianJaffePinterest thanks for more details.
   
   > The output of an intra-day run needs to overshadow any existing output for 
the time it's running for, which can be handled by overshadowing, but the 
intra-day output of a conversion pipeline shouldn't affect the output of any 
other pipeline, so simply synchronizing on version and using a linear shard 
spec won't work.
   
   Just FYI, you can actually do this with the segment locking and the task 
audit logging. Since you can track what task created what segments from task 
audit logs, you can overwrite only the segments what you want with segment 
locking. (The task audit logging is deprecated because we haven't found a good 
use case for it. If this is a popular use case, then we may need to consider 
supporting it back. See https://github.com/apache/druid/issues/5859 for 
details.). But it requires you to track all segments outside of Druid and I 
guess this would be more complex than the proposal.
   
   As long as there is no unique use case where only this proposal can address, 
I'm more inclined to fixing the union query properly because we have to do it 
anyway. As @himanshug mentioned, that could address most of the problems 
mentioned here even though there is at least one more potential issue with 
query performance. The segment balancer uses a heuristic that the segments will 
be more likely queried together as their intervals are closer. Based on this 
heuristic, the segment balancer assigns the segments of close intervals into 
different historicals so that they can be processed in parallel. This 
assumption doesn't apply to the union query. However, I don't know what the 
impact of this would be.
   
   > That problem is a lot more cosmetic . It could either be handled by 
introducing concept of "Virtual DataSource" at the api layer on top of Druid 
that typically customers have or could be implemented as a feature in Druid 
itself.
   
   @himanshug this is true. I guess this could be the view in SQL. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

Reply via email to