steFaiz commented on code in PR #6738:
URL: https://github.com/apache/paimon/pull/6738#discussion_r2588659837


##########
paimon-common/src/main/java/org/apache/paimon/lookup/sort/BlockIterator.java:
##########
@@ -70,32 +62,58 @@ public void remove() {
         throw new UnsupportedOperationException();
     }
 
+    /**
+     * Seeks to the first entry with a key that is either equal to or greater 
than the specified
+     * key. After this call, the next invocation of {@link 
BlockIterator#next()} will return that
+     * entry (if it exists).
+     *
+     * <p>Note that the comparing value must be monotonically increasing 
across current block e.g.
+     * key and some special values such as the {@code lastRecordPosition} of 
an {@code
+     * IndexBlockEntry}.
+     *
+     * @param targetKey target key
+     * @return true if found an equal record
+     */
     public boolean seekTo(MemorySlice targetKey) {
         int left = 0;
         int right = recordCount - 1;
+        int mid = left + (right - left) / 2;
 
         while (left <= right) {
-            int mid = left + (right - left) / 2;
+            mid = left + (right - left) / 2;
 
             seekTo(mid);
             BlockEntry midEntry = readEntry();
             int compare = comparator.compare(midEntry.getKey(), targetKey);
 
             if (compare == 0) {
-                polled = midEntry;
-                return true;
+                break;
             } else if (compare > 0) {
-                polled = midEntry;
                 right = mid - 1;
             } else {
                 left = mid + 1;

Review Comment:
   ## **Quick Answer**
   It can't pass the Test QAQ
   <img width="642" height="158" alt="image" 
src="https://github.com/user-attachments/assets/e1fc0a1b-94fa-4b1a-b353-e0a7f7aa4797";
 />
   ## **Reason**:
   if we just let `polled` = null here, it means that if we are seeking to some 
value exactly not present in current block, the `polled` will always be null. 
At that case, the next iterator position after seek will always be the position 
after the last `MID` position (because we will seek to mid and call readNext 
for each loop).
   Let's recall the example before:
   First case: 
   * At that case, the target element is skipped!
   ```
                          Iter Pos
                              ∆
                              |
                              | 
   values:   ... 10, 20, 30, 40, 50 ...
                      |   |
                      V   V
                    HIGH MID
                         LOW
   ```
   Second case:
   * At that case, the iteration process is expected.
   ```
                    Iter Pos
                       ∆   
                       |   
   values:    10, 20, 30, 40, 50
                   |   |
                   V   V
                  MID LOW
                 HIGH
   ```
   ## **Thinking**
   I think if we really want to keep the `polled` field, which is always the 
target element (if has some), we need to seek to the next position of polled 
before returning. I think it will be equilvalent to current implementation. And 
removing `polled` field makes the code more clean and readable.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to