junegunn commented on PR #8001:
URL: https://github.com/apache/hbase/pull/8001#issuecomment-4181375884
_However_, even with qualifier comparison, false positives remain: exactly N
consecutive redundant DCs for the same qualifier trigger an inefficient seek.
- DC(q1) N=1 skip
- DC(q1) N=2 skip
- DC (q1) N=3 seek (false positive)
- DC (q2) N=1 skip
- DC (q2) N=2 skip
- DC (q2) N=3 seek (false positive)
- DC (q3) N=1 skip
- DC (q3) N=2 skip
- ...
This should be rare in practice. But if overhead is a concern, increasing N
is the only option.
Here is a benchmark for this case, with an additional N=10 build:
```ruby
benchmark(:DeleteColumnFalsePositiveEvery3) do |i|
T.put(PUT) if i.zero?
dc = Delete.new(ROW).addColumns(CF, (i / 3).to_s.to_java_bytes)
T.delete(dc)
flush 't' if (i % 100_000).zero? && i.positive?
end
```
<img width="1152" height="960" alt="image"
src="https://github.com/user-attachments/assets/4fb939c6-4865-4da2-88ee-f25ad1312f95"
/>
As expected, qualifier comparison does not help in this case, but a larger
threshold (N=10) significantly reduces the overhead. Given the rarity of such
scenarios, the overhead against master is acceptable.
I believe 10 is a good threshold. This optimization targets cases where
hundreds of thousands or millions of delete markers are swept, so the cost of
10 extra skips is negligible.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]