hudi-bot opened a new issue, #17408: URL: https://github.com/apache/hudi/issues/17408
Flink writer do not support record-position for updates and deletes yet, will support it later. ## JIRA info - Link: https://issues.apache.org/jira/browse/HUDI-9192 - Type: Sub-task - Parent: https://issues.apache.org/jira/browse/HUDI-9075 - Fix version(s): - 1.2.0 --- ## Comments 30/Apr/25 09:09;geserdugarov;I've raised more questions then figured out how to implement it properly during research of this task. Initially, Flink integration doesn't support record positions processing even for `HoodieRecord`s. But I didn't found any explicit description of how "log records positions" feature should work. So, we have to use Spark implementation as an example. Writing of log records positions and using them are separated between different PRs, and were implemented by different developers: 1) additional property in log block header (without writing): [https://github.com/apache/hudi/pull/9376] 2) writing log record positions in log block header: [https://github.com/apache/hudi/pull/9581] 3) using of log record positions from log block headers in file group readers: [https://github.com/apache/hudi/pull/9819] Confusing part here, that we write log record positions as a set in `HoodieLogBlock::addRecordPositionsIfRequired`: [https://github.com/apache/hudi/blob/6f84c401b3a809997be1573b0d04e8106fd87fac/hudi-common/src/main/java/org/apache/hudi/common/table/log/block/HoodieLogBlock.java#L390-L392] But later extract them as a list in `PositionBasedFileGroupRecordBuffer::extractRecordPositions`: [https://github.com/apache/hudi/blob/6f84c401b3a809997be1573b0d04e8106fd87fac/hudi-common/src/main/java/org/apache/hudi/common/table/read/PositionBasedFileGroupRecordBuffer.java#L305-L307] and use them as a list in `PositionBasedFileGroupRecordBuffer::processDataBlock`: [https://github.com/apache/hudi/blob/6f84c401b3a809997be1573b0d04e8106fd87fac/hudi-common/src/main/java/org/apache/hudi/common/table/read/PositionBasedFileGroupRecordBuffer.java#L132-L136] ;;; --- 30/Apr/25 11:51;geserdugarov;During checking of cases when log record positions are written, found another issue with not persistent index configuration: https://github.com/apache/hudi/issues/13241;;; --- 30/Apr/25 12:03;geserdugarov;Need to clarify a lot of different questions first. So I postponed this task for now, and unassigned it from myself to allow anybody to work on this task.;;; -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
