gianm edited a comment on pull request #12137:
URL: https://github.com/apache/druid/pull/12137#issuecomment-1030344922


   Very interesting!
   
   Some comments on the design first. Afterwards, a musing 🙂
   
   - Compatibility scenario 1: what happens if someone using dropExisting today 
does a rolling cluster update? MMs are typically updated first. When an ingest 
task gets launched on a new MM, it will generate tombstones. The Overlords, 
Coordinators, and Brokers will still be running old code. Will they be able to 
handle the tombstones properly? If not, then we should avoid repurposing the 
`dropExisting` flag.
   - Compatibility scenario 2: what happens if runs a job that generates 
tombstones and then rolls back to an earlier version of Druid? This is sort of 
a more extreme version of scenario 1. The Overlords, Coordinators, Brokers, and 
Historicals will all be running the older code. Will they be able to work in 
some reasonable way or will they have trouble handling the tombstones? If there 
will be trouble, we'll need to make it clear that people should avoid using 
this feature until they're sure they won't need to roll back.
   - Zero-sized segments make me wonder if some BalancerStrategy 
implementations will have problems handling them. So I'd suggest having tests 
with all current BalancerStrategy impls to make sure this works, or 
alternatively, using size: 1.
   - Not all query types are expected to return zero rows when the input 
segment has zero rows. This may be an issue for the NoOpQueryRunner based 
approach. For example, segmentMetadata queries on segments with zero rows 
should still return the segment id, etc.
   - In the case where tombstones + new segments fully overshadow an existing 
segment, there isn't a reason to keep the tombstones loaded up forever. They 
could be dropped from the cluster after the segment they're meant to shadow has 
been unloaded. I think doing this would involve some logic adjustments to the 
Coordinator, and would be valuable so people don't run into proliferations of 
tombstones. Would you be up for adding this logic, either in this patch or in a 
follow-up?
   - Special case of the above: what happens if a datasource is 100% composed 
of tombstones? Ideally, those tombstones would all get dropped and the 
datasource would be completely unloaded.
   
   ---
   
   The musing I promised: IMO this is going to be a very useful feature beyond 
the `dropExisting` use case. It'll give us a "delete" operation that is 
distinct from the "drop" operation. That's important because "drop" should be 
reversible by "re-load" but "delete" shouldn't be. For example, consider this 
case:
   
   1. You delete some time ranges, either using one of the Coordinator DELETE 
APIs or by using `dropExisting`. Today, these are implemented by marking the 
relevant segments unused.
   2. You don't ever add more data for these time ranges.
   3. Later on, you drop the table to save space. This marks all the rest of 
the segments in the table unused.
   4. Later still, you reload the table so you can query it again. This goes 
through and marks all the topmost segments for each time chunk as used.
   5. Surprise! The time ranges you thought you deleted in (1) are now back!
   
   This is confusing, and it's fixed by creating tombstones in (1) rather than 
marking the segments unused. That way, in step (4), the tombstones would block 
the unwanted segments from getting reloaded.
   
   So, this patch updates `dropExisting` to create tombstones, but IMO in the 
future we should also be creating tombstones for any operation that the user 
would logically think is more of a "delete" than a "drop".


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to