[ 
https://issues.apache.org/jira/browse/HUDI-9192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17948406#comment-17948406
 ] 

Geser Dugarov commented on HUDI-9192:
-------------------------------------

I've raised more questions then figured out how to implement it properly during 
research of this task.

 

Initially, Flink integration doesn't support record positions processing even 
for `HoodieRecord`s. But I didn't found any explicit description of how "log 
records positions" feature should work. So, we have to use Spark implementation 
as an example.

Writing of log records positions and using them are separated between different 
PRs, and were implemented by different developers:

1) additional property in log block header (without writing): 
[https://github.com/apache/hudi/pull/9376]

2) writing log record positions in log block header: 
[https://github.com/apache/hudi/pull/9581]

3) using of log record positions from log block headers in file group readers: 
[https://github.com/apache/hudi/pull/9819]

 

Confusing part here, that we write log record positions as a set in 
`HoodieLogBlock::addRecordPositionsIfRequired`:

[https://github.com/apache/hudi/blob/6f84c401b3a809997be1573b0d04e8106fd87fac/hudi-common/src/main/java/org/apache/hudi/common/table/log/block/HoodieLogBlock.java#L390-L392]

But later extract them as a list in 
`PositionBasedFileGroupRecordBuffer::extractRecordPositions`:

[https://github.com/apache/hudi/blob/6f84c401b3a809997be1573b0d04e8106fd87fac/hudi-common/src/main/java/org/apache/hudi/common/table/read/PositionBasedFileGroupRecordBuffer.java#L305-L307]

and use them as a list in 
`PositionBasedFileGroupRecordBuffer::processDataBlock`:

[https://github.com/apache/hudi/blob/6f84c401b3a809997be1573b0d04e8106fd87fac/hudi-common/src/main/java/org/apache/hudi/common/table/read/PositionBasedFileGroupRecordBuffer.java#L132-L136]

 

> [RFC-87] RowData log handle supports writing record positions to log block
> --------------------------------------------------------------------------
>
>                 Key: HUDI-9192
>                 URL: https://issues.apache.org/jira/browse/HUDI-9192
>             Project: Apache Hudi
>          Issue Type: Sub-task
>          Components: flink-sql
>            Reporter: Shuo Cheng
>            Assignee: Geser Dugarov
>            Priority: Major
>
> Flink writer do not support record-position for updates and deletes yet, will 
> support it later.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to