luoyuxia opened a new issue, #3290:
URL: https://github.com/apache/fluss/issues/3290

   ### Search before asking
   - [x] I searched in the [issues](https://github.com/apache/fluss/issues) and 
found nothing similar.
   
   ### Fluss version
   main (development)
   
   ### Please describe the bug 🐞
   In `TieringSplitReader.forLogRecords`, the lake writer is created before 
checking whether the current polled records actually contain any record with 
`record.logOffset() < stoppingOffset`.
   
   If the polled batch only contains records whose offsets have already reached 
or passed the split `stoppingOffset`, the split can still be marked finished 
based on the last record offset, but nothing is written into the lake writer.
   
   This shows up with logical empty batches that still advance offsets, for 
example with first-row merge engine updates or deleting a non-existent key. In 
that case `recordWriter.complete()` may fail during tiering commit with:
   
   ```
   The size of CommitMessage must be 1, but got [].
   ```
   
   A concrete sequence is:
   1. `TieringSplitReader` subscribes a log split with a finite 
`stoppingOffset`.
   2. `forLogRecords` receives `bucketScanRecords` for that bucket, but every 
record has `logOffset() >= stoppingOffset`.
   3. A lake writer has already been created, although no record is written.
   4. The reader sees `lastRecord.logOffset() >= stoppingOffset - 1`, finishes 
the split, and calls `completeLakeWriter()`.
   5. The underlying lake writer completes an empty write and fails, for 
example Paimon throws `The size of CommitMessage must be 1, but got [].`
   
   ### Solution
   Create the lake writer lazily only when the first record satisfying 
`record.logOffset() < stoppingOffset` is encountered. If no record is actually 
written for the batch/split, keep the current `completeLakeWriter()` behavior 
and return a null write result.
   
   ### Are you willing to submit a PR?
   - [x] Im willing to submit a PR!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to