[jira] [Commented] (HUDI-8819) Hudi 1.0's backward writer's UPDATE/DELETE would corrupt older versioned Hudi table

Sagar Sumit (Jira) Mon, 20 Jan 2025 03:42:07 -0800


    [ 
https://issues.apache.org/jira/browse/HUDI-8819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17914622#comment-17914622
 ]


Sagar Sumit commented on HUDI-8819:
-----------------------------------

In the scope this ticket, we need to check with Athena as the reader instead of 
Spark. As long as following conditions are met, Athena should be able to read 
the tables written in format version 6 using 1.0.0.
 # Glue catalog sync configs are set correctly and meta sync is running fine.
 # [Migration 
protocol|https://hudi.apache.org/docs/deployment#upgrading-to-100] has been 
followed (note that initially we need to keep both metadata and autoUpgrade 
false for the writer).

[~yc2523] Could you please confirm whether the issue still happens after 
following the protocol? I did not see the configs being set correctly in the 
description of this ticket.

> Hudi 1.0's backward writer's UPDATE/DELETE would corrupt older versioned Hudi 
> table
> -----------------------------------------------------------------------------------
>
>                 Key: HUDI-8819
>                 URL: https://issues.apache.org/jira/browse/HUDI-8819
>             Project: Apache Hudi
>          Issue Type: Sub-task
>    Affects Versions: 1.0.0
>            Reporter: Shawn Chang
>            Assignee: Davis Zhang
>            Priority: Blocker
>             Fix For: 1.0.1
>
>          Time Spent: 7h
>  Remaining Estimate: 0h
>
> Reproduction:
>  # Create a table with Hudi 0.14 + Spark 3.5.0 with some rows
>  # Use Hudi 1.0.0 + Spark 3.5.3 as writer, set 
> .option("hoodie.write.table.version", 6) to enable backward writer
>  
>  # After updating some rows, read with Hudi 1.0.0 + Spark 3.5.3: 
> spark.read.format("hudi").load(tablePath)
>  
>  # The read results from Hudi 1.0.0 + Spark 3.5.3 would only contain updated 
> rows
>  # Same happens to DELETE, if we delete some rows with Hudi 1.0.0 + Spark 
> 3.5.3, then the Spark reader can only see the delete blocks that contain zero 
> row
>  # Older versioned Hudi reader (Athena) can still see the correct results 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HUDI-8819) Hudi 1.0's backward writer's UPDATE/DELETE would corrupt older versioned Hudi table

Reply via email to