junegunn commented on PR #8001:
URL: https://github.com/apache/hbase/pull/8001#issuecomment-4176485517
I ran another test to truly check the SEEK overhead. In this case, we
generate many `DeleteColumn`s for different qualifiers:
```ruby
benchmark(:DeleteColumnFalsePositive) do |i|
T.put(PUT) if i.zero?
dc = Delete.new(ROW).addColumns(CF, i.to_s.to_java_bytes)
T.delete(dc)
# Let's manually flush after every 100,000 operations because it's hard to
# fill up the memstore only with delete markers.
flush 't' if (i % 100_000).zero? && i.positive?
end
```
- DC Q1
- DC Q2
- DC Q3
- DC Q4
- DC Q5
- ...
- Put Q0
SEEK can only advance the pointer by only one cell, providing no advantage
over SKIP. This is the worst case for this optimization.
Testing revealed I underestimated the overhead. Increasing N to 10 helped
reduce it though (see `alt-n10` case).
<img width="1152" height="960" alt="image"
src="https://github.com/user-attachments/assets/5a930194-097f-4449-b8eb-68b6e483a972"
/>
To avoid regression in such cases, we need to either:
- Choose a large N value (10?) so SEEK happens less frequently.
- Or add qualifier comparison to the code, accepting the added complexity.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]