jihoonson commented on issue #9463: Add namespaces to Druid segments within a data source URL: https://github.com/apache/druid/issues/9463#issuecomment-598498946 @JulianJaffePinterest thanks for more details. > The output of an intra-day run needs to overshadow any existing output for the time it's running for, which can be handled by overshadowing, but the intra-day output of a conversion pipeline shouldn't affect the output of any other pipeline, so simply synchronizing on version and using a linear shard spec won't work. Just FYI, you can actually do this with the segment locking and the task audit logging. Since you can track what task created what segments from task audit logs, you can overwrite only the segments what you want with segment locking. (The task audit logging is deprecated because we haven't found a good use case for it. If this is a popular use case, then we may need to consider supporting it back. See https://github.com/apache/druid/issues/5859 for details.). But it requires you to track all segments outside of Druid and I guess this would be more complex than the proposal. As long as there is no unique use case where only this proposal can address, I'm more inclined to fixing the union query properly because we have to do it anyway. As @himanshug mentioned, that could address most of the problems mentioned here even though there is at least one more potential issue with query performance. The segment balancer uses a heuristic that the segments will be more likely queried together as their intervals are closer. Based on this heuristic, the segment balancer assigns the segments of close intervals into different historicals so that they can be processed in parallel. This assumption doesn't apply to the union query. However, I don't know what the impact of this would be. > That problem is a lot more cosmetic . It could either be handled by introducing concept of "Virtual DataSource" at the api layer on top of Druid that typically customers have or could be implemented as a feature in Druid itself. @himanshug this is true. I guess this could be the view in SQL.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org