junegunn commented on PR #8001:
URL: https://github.com/apache/hbase/pull/8001#issuecomment-4176485517

   I ran another test to truly check the SEEK overhead. In this case, we 
generate many `DeleteColumn`s for different qualifiers:
   
   ```ruby
   benchmark(:DeleteColumnFalsePositive) do |i|
     T.put(PUT) if i.zero?
   
     dc = Delete.new(ROW).addColumns(CF, i.to_s.to_java_bytes)
     T.delete(dc)
   
     # Let's manually flush after every 100,000 operations because it's hard to
     # fill up the memstore only with delete markers.
     flush 't' if (i % 100_000).zero? && i.positive?
   end
   ```
   
   - DC Q1
   - DC Q2
   - DC Q3
   - DC Q4
   - DC Q5
   - ...
   - Put Q0
   
   SEEK can only advance the pointer by only one cell, providing no advantage 
over SKIP. This is the worst case for this optimization.
   
   Testing revealed I underestimated the overhead. Increasing N to 10 helped 
reduce it though (see `alt-n10` case).
   
   <img width="1152" height="960" alt="image" 
src="https://github.com/user-attachments/assets/5a930194-097f-4449-b8eb-68b6e483a972";
 />
   
   To avoid regression in such cases, we need to either:
   - Choose a large N value (10?) so SEEK happens less frequently.
   - Or add qualifier comparison to the code, accepting the added complexity.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to