[
https://issues.apache.org/jira/browse/HUDI-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17932169#comment-17932169
]
Ranga Reddy commented on HUDI-9079:
-----------------------------------
*Migration Steps:*
{noformat}
"default" partition detected. From 0.12, we are changing the default partition
in hudi to "__HIVE_DEFAULT_PARTITION__". Please read and write back the data in
"default" partition in hudi to new partition path "__HIVE_DEFAULT_PARTITION__".
"
Sample spark command to use to re-write the data:
val df =
spark.read.format("hudi").load(HUDI_TABLE_PATH).filter(col("PARTITION_PATH_COLUMN")
=== "default");
df.drop("_hoodie_commit_time").drop("_hoodie_commit_seqno").drop("_hoodie_record_key")
.drop("_hoodie_partition_path").drop("_hoodie_file_name").withColumn(PARTITION_PATH_COLUMN,"__HIVE_DEFAULT_PARTITION__")
.write.format("hudi").options(writeOptions).mode(Append).save(HUDI_TABLE_PATH);
Please fix values for PARTITION_PATH_COLUMN, HUDI_TABLE_PATH and set all write
configs in above command before running. Also do delete the records in old
partition once above command succeeds. Sample spark command to delete old
partition records:
val df =
spark.read.format("hudi").load(HUDI_TABLE_PATH).filter(col("PARTITION_PATH_COLUMN")
=== "default");
df.write.format("hudi").option("hoodie.datasource.write.operation","delete").options(writeOptions).mode(Append).save(HUDI_TABLE_PATH);
{noformat}
> Log the exception message properly to handle the "default" partition value
> migration steps.
> -------------------------------------------------------------------------------------------
>
> Key: HUDI-9079
> URL: https://issues.apache.org/jira/browse/HUDI-9079
> Project: Apache Hudi
> Issue Type: Bug
> Affects Versions: 0.12.0, 0.13.0, 0.14.0, 0.15.0, 1.0.0
> Reporter: Ranga Reddy
> Priority: Major
> Labels: pull-request-available
>
> In the step for deleting the default partition, we need to specify the format
> as Hudi during the Spark writing process; otherwise, it will be treated as a
> normal Parquet file.
>
> {code:java}
> df.write.option(\"hoodie.datasource.write.operation\",\"delete\").options(writeOptions).mode(Append).save(HUDI_TABLE_PATH);{code}
> https://github.com/apache/hudi/blob/master/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/upgrade/FourToFiveUpgradeHandler.java#L59
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)