[ 
https://issues.apache.org/jira/browse/HUDI-9590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18005221#comment-18005221
 ] 

sivabalan narayanan commented on HUDI-9590:
-------------------------------------------

Purpose: 

Optimized Log block scan is meant to support compacted log blocks. So, V1 scan 
does not have the support, while V2 (optimized log block scan) has the support. 

Apart from that, we should not have any deviations. 

 

Impl: 

>From the core logic, I did not find any difference between v1 and v2 ignoring 
>the compacted log block support. 


{code:java}
Conceptual difference
🔹 scanInternalV1


Assumes simple linear append of log blocks.

Each block is processed as it's read.

Rollback COMMAND_BLOCK immediately triggers removal of matching blocks in stack.

At end: deduplication and ordering fixes for any spurious duplicates.{code}
{code:java}
🔹 scanInternalV2

Designed to handle multi-writer, out-of-order rollbacks, and log compaction.

Uses two stages:

Forward scan:

Build a map of instantTime → blocks.

Collect rollback targets.

Reverse scan:

Reconstruct correct block sequence.

Resolve any compacted blocks.






4️⃣ Algorithm shape


V1:

Single pass.

Immediate filtering, stack manipulation.

Deduplication at the end.


V2:

Pass 1: Build map, remove rollback targets.

Pass 2: Reverse iteration, resolve compactions.

Queue building done in controlled, finalized second phase.


✅ V2 intentionally separates scanning from resolution.

5️⃣ Handling of Corrupt Blocks


Both versions:

Increment corrupt block counters.

Skip reading their contents.




✅ Similar.

✅ Pseudocode-level difference
scanV1 (rough)
arduinoCopyEditfor each logBlock in logFiles:
    if isRollbackCommand:
        remove matching blocks immediately from stack
    else if isDataOrDeleteBlock:
        push to stack
deduplicate(stack)
process(stack)

scanV2 (rough)
arduinoCopyEditfor each logBlock in logFiles:
    if isRollbackCommand:
        record targetInstant
        remove target from map
    else:
        store in map by instantTime
reverse order instants:
    resolve compaction mapping
    add only final compacted forms to output queue
process(queue) {code}




 

 

> OptimizedLogBlockScan support w/ FG reader
> ------------------------------------------
>
>                 Key: HUDI-9590
>                 URL: https://issues.apache.org/jira/browse/HUDI-9590
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: reader-core, writer-core
>            Reporter: sivabalan narayanan
>            Assignee: sivabalan narayanan
>            Priority: Major
>
> We have compacted log block support in older version of Log Record Reader. 
> we need to bring in parity to FG reader as well. 
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to