[
https://issues.apache.org/jira/browse/HUDI-5057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zouxxyy updated HUDI-5057:
--------------------------
Description:
When disable `hoodie.datasource.write.hive_style_partitioning`
Run `msck repair table` sql fails to repair the partitions in the file system
to the catalog
For example:
1. create table by sparksql
{code:java}
create table h0 (
id int,
name string,
ts long,
dt string)
using hudi
partitioned by (dt)
location '/tmp/test'
tblproperties (
primaryKey = 'id',
preCombineField = 'ts',
hoodie.datasource.write.hive_style_partitioning = 'false');{code}
2. modify the partitions
{code:java}
val df = Seq((1, "a1", 1000, "2022-10-06")).toDF("id", "name", "ts", "dt");
df.write.format("hudi")
.option(RECORDKEY_FIELD.key, "id")
.option(PRECOMBINE_FIELD.key, "ts")
.option(PARTITIONPATH_FIELD.key, "dt")
.option(HIVE_STYLE_PARTITIONING_ENABLE.key, "false")
.mode(SaveMode.Append)
.save("/tmp/test");{code}
3. run msck repair table by sparksql
{code:java}
msck repair table h0;{code}
4. list partitionNames
{code:java}
val table = spark.sessionState.sqlParser.parseTableIdentifier("h0");
spark.sessionState.catalog.listPartitionNames(table).toArray;{code}
It should return ("dt=2022-10-06") but ()
was:
When disable `hoodie.datasource.write.hive_style_partitioning`
Run `msck repair table` sql fails to identify the partitions in the file
system to the catalog
For example:
1. create table by sparksql
{code:java}
create table h0 (
id int,
name string,
ts long,
dt string)
using hudi
partitioned by (dt)
location '/tmp/test'
tblproperties (
primaryKey = 'id',
preCombineField = 'ts',
hoodie.datasource.write.hive_style_partitioning = 'false');{code}
2. modify the partitions
{code:java}
val df = Seq((1, "a1", 1000, "2022-10-06")).toDF("id", "name", "ts", "dt");
df.write.format("hudi")
.option(RECORDKEY_FIELD.key, "id")
.option(PRECOMBINE_FIELD.key, "ts")
.option(PARTITIONPATH_FIELD.key, "dt")
.option(HIVE_STYLE_PARTITIONING_ENABLE.key, "false")
.mode(SaveMode.Append)
.save("/tmp/test");{code}
3. run msck repair table by sparksql
{code:java}
msck repair table h0;{code}
4. list partitionNames
{code:java}
val table = spark.sessionState.sqlParser.parseTableIdentifier("h0");
spark.sessionState.catalog.listPartitionNames(table).toArray;{code}
It should return ("dt=2022-10-06") but ()
> Fix msck repair table
> ---------------------
>
> Key: HUDI-5057
> URL: https://issues.apache.org/jira/browse/HUDI-5057
> Project: Apache Hudi
> Issue Type: Bug
> Components: spark-sql
> Affects Versions: 0.12.0
> Reporter: zouxxyy
> Assignee: zouxxyy
> Priority: Major
>
> When disable `hoodie.datasource.write.hive_style_partitioning`
> Run `msck repair table` sql fails to repair the partitions in the file
> system to the catalog
> For example:
> 1. create table by sparksql
> {code:java}
> create table h0 (
> id int,
> name string,
> ts long,
> dt string)
> using hudi
> partitioned by (dt)
> location '/tmp/test'
> tblproperties (
> primaryKey = 'id',
> preCombineField = 'ts',
> hoodie.datasource.write.hive_style_partitioning = 'false');{code}
> 2. modify the partitions
> {code:java}
> val df = Seq((1, "a1", 1000, "2022-10-06")).toDF("id", "name", "ts", "dt");
> df.write.format("hudi")
> .option(RECORDKEY_FIELD.key, "id")
> .option(PRECOMBINE_FIELD.key, "ts")
> .option(PARTITIONPATH_FIELD.key, "dt")
> .option(HIVE_STYLE_PARTITIONING_ENABLE.key, "false")
> .mode(SaveMode.Append)
> .save("/tmp/test");{code}
> 3. run msck repair table by sparksql
> {code:java}
> msck repair table h0;{code}
> 4. list partitionNames
> {code:java}
> val table = spark.sessionState.sqlParser.parseTableIdentifier("h0");
> spark.sessionState.catalog.listPartitionNames(table).toArray;{code}
> It should return ("dt=2022-10-06") but ()
--
This message was sent by Atlassian Jira
(v8.20.10#820010)