loquisgon edited a comment on pull request #12137:
URL: https://github.com/apache/druid/pull/12137#issuecomment-1024657676


   @JulianJaffePinterest Yeah dropping the segments and replacing them with 
tombstones can be very similar in some cases. What I have been trying to say is 
that replace will follow the standard way of segment replacement as explained 
in the Druid's design documentation. However, let me share the following 
scenario where replace with tombstones produces the intended semantics as 
explained in the PR's design discussion and drop does not.
   
   Start with:
   2015-09-12T00:00:00.000Z/2015-09-13T00:00:00.000Z
   DAY granularity, rows in every hour. (i.e. a full day of data with each of 
the 24 hours contains rows. This is the typical wikipedia file if you are 
familiar with it),
   
   Replace with: 
   2015-09-12T08:00:30.000Z/2015-09-12T20:00:00.000Z
   HOUR granularity
   Interval covers 12 hours but only contains three rows for three different 
hours
   In this case, drop would not drop the underlying DAY granularity interval 
since the replace interval does not cover it completely. Thus the result is 
that only the hours in the replacement interval that have replacement rows 
(three) will be replaced (three one hour segments each with one hour will be 
created) and since the underlying segment (DAY) is there it will only be 
partially overshadowed so all the other hours will still be there. At the end, 
all hours would still contain data but now the three hours that were replaced 
with one row each would have a single row.
   
   However, with this new replace functionality (as stated in the design of 
this PR and the code), the replace would generate  12 new segments, nine of 
them tombstones and the three segments with data, one row each). All these 
segments would still partially overshadow the existing DAY segment but the net 
effect would be that all data in the 12 hours in the replace interval would be 
replaced by just the three new rows in the input (all other hours in the 
replace would not report any data since they are covered by the tombstones).
   
   I hope this example helps to clarify the semantic difference.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to