[
https://issues.apache.org/jira/browse/HUDI-8819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17914622#comment-17914622
]
Sagar Sumit commented on HUDI-8819:
-----------------------------------
In the scope this ticket, we need to check with Athena as the reader instead of
Spark. As long as following conditions are met, Athena should be able to read
the tables written in format version 6 using 1.0.0.
# Glue catalog sync configs are set correctly and meta sync is running fine.
# [Migration
protocol|https://hudi.apache.org/docs/deployment#upgrading-to-100] has been
followed (note that initially we need to keep both metadata and autoUpgrade
false for the writer).
[~yc2523] Could you please confirm whether the issue still happens after
following the protocol? I did not see the configs being set correctly in the
description of this ticket.
> Hudi 1.0's backward writer's UPDATE/DELETE would corrupt older versioned Hudi
> table
> -----------------------------------------------------------------------------------
>
> Key: HUDI-8819
> URL: https://issues.apache.org/jira/browse/HUDI-8819
> Project: Apache Hudi
> Issue Type: Sub-task
> Affects Versions: 1.0.0
> Reporter: Shawn Chang
> Assignee: Davis Zhang
> Priority: Blocker
> Fix For: 1.0.1
>
> Time Spent: 7h
> Remaining Estimate: 0h
>
> Reproduction:
> # Create a table with Hudi 0.14 + Spark 3.5.0 with some rows
> # Use Hudi 1.0.0 + Spark 3.5.3 as writer, set
> .option("hoodie.write.table.version", 6) to enable backward writer
>
> # After updating some rows, read with Hudi 1.0.0 + Spark 3.5.3:
> spark.read.format("hudi").load(tablePath)
>
> # The read results from Hudi 1.0.0 + Spark 3.5.3 would only contain updated
> rows
> # Same happens to DELETE, if we delete some rows with Hudi 1.0.0 + Spark
> 3.5.3, then the Spark reader can only see the delete blocks that contain zero
> row
> # Older versioned Hudi reader (Athena) can still see the correct results
--
This message was sent by Atlassian Jira
(v8.20.10#820010)