loquisgon commented on pull request #12137: URL: https://github.com/apache/druid/pull/12137#issuecomment-1024657676
@JulianJaffePinterest Yeah dropping the segments and replacing them with tombstones can be very similar in some cases. What I have been trying to say is that replace will follow the standard way of segment replacement as explained in the Druid's design documentation. However, let me share the following scenario where replace with tombstones produces the intended semantics as explained in the PR's design discussion and drop does not. Start with: 2015-09-12T00:00:00.000Z/2015-09-13T00:00:00.000Z DAY granularity, rows in every hour. (i.e. a full day of data with each of the 24 hours contains rows. This is the typical wikipedia file if you are familiar with it), Replace with: 2015-09-12T08:00:30.000Z/2015-09-12T20:00:00.000Z HOUR granularity Interval covers 12 hours but only contains three rows for three different hours In this case, drop would not drop the underlying DAY granularity interval since the replace interval does not cover it completely. Thus the result is that only the hours in the replacement interval that have replacement rows (three) will be replaced and since the underlying segment (DAY) is there it will only be partially overshadowed so all the other hours will still be there. At the end, all hours would still contain data but now the three hours that were replaced with one row each would have a single row. However, with this new replace functionality (as stated in the design of this PR and the code), the replace would generate 12 new segments, nine of them tombstones and the three segments with data )one row each). All these segments would still partially overshadow the existing DAY segment but the net effect would be that all data in the 12 hours in the replace interval would be replaced by just the three new rows in the input (all other hours in the replace would not report any data). I hope this example helps to clarify the semantic difference. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
