[
https://issues.apache.org/jira/browse/HUDI-5047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
xi chaomin updated HUDI-5047:
-----------------------------
Description:
When I sync hive with hoodie.datasource.write.drop.partition.columns=false, the
query with partition in where clause returns empty. eg: "select * from
mor_table where partition=$partition".
So I set hive with hoodie.datasource.write.drop.partition.columns = true.
If hoodie.datasource.write.drop.partition.columns = true, the update record
cann't be read in mor table.
Steps to reproduce:
# write data and query
{code:java}
val df1 = Seq(
("100", "1001", "2022-01-01"),
("200", "1002", "2022-01-01"),
("300", "1003", "2022-01-01"),
("400", "1004", "2022-01-02"),
("500", "1005", "2022-01-02"),
("600", "1006", "2022-01-02")
).toDF("id", "name", "dt") val hudiOptions = Map(
"hoodie.table.name" -> tableName,
"hoodie.datasource.write.table.type" -> "MERGE_ON_READ",
"hoodie.datasource.write.operation" -> "upsert",
"hoodie.datasource.write.recordkey.field" -> "id",
"hoodie.datasource.write.precombine.field" -> "name",
"hoodie.datasource.write.partitionpath.field" -> "dt",
"hoodie.index.type" -> "BLOOM",
"hoodie.table.keygenerator.class"->
"org.apache.hudi.keygen.SimpleKeyGenerator",
"hoodie.datasource.write.drop.partition.columns"->"true"
) df1.write.format("hudi")
.options(hudiOptions)
.mode(Append)
.save(basePath)
val viewDF = spark
.read
.format("org.apache.hudi")
.load(basePath) viewDF.createOrReplaceTempView(tableName)
spark.sql(s"select * from $tableName where dt='2022-01-01'").show() {code}
the query works well.
# update record and query
{code:java}
val df2 = Seq(
("100", "10010", "2022-01-01"),
("200", "10020", "2022-01-01"),
("300", "10030", "2022-01-01"),
("400", "10040", "2022-01-02"),
("500", "10050", "2022-01-02"),
("600", "10060", "2022-01-02")
).toDF("id", "name", "dt") df2.write.format("hudi")
.options(hudiOptions)
.mode(Append)
.save(basePath)
val viewDF2 = spark
.read
.format("org.apache.hudi")
.load(basePath) viewDF2.createOrReplaceTempView(tableName)
spark.sql(s"select * from $tableName where dt='2022-01-01'").show() {code}
returns 0 record.
was:
When I sync hive with hoodie.datasource.write.drop.partition.columns=false, the
query with partition in where clause returns null. eg: "select * from mor_table
where partition=$partition".
So I set hive with hoodie.datasource.write.drop.partition.columns = true.
If hoodie.datasource.write.drop.partition.columns = true, the update record
cann't be read in mor table.
Steps to reproduce:
# write data and query
{code:java}
val df1 = Seq(
("100", "1001", "2022-01-01"),
("200", "1002", "2022-01-01"),
("300", "1003", "2022-01-01"),
("400", "1004", "2022-01-02"),
("500", "1005", "2022-01-02"),
("600", "1006", "2022-01-02")
).toDF("id", "name", "dt") val hudiOptions = Map(
"hoodie.table.name" -> tableName,
"hoodie.datasource.write.table.type" -> "MERGE_ON_READ",
"hoodie.datasource.write.operation" -> "upsert",
"hoodie.datasource.write.recordkey.field" -> "id",
"hoodie.datasource.write.precombine.field" -> "name",
"hoodie.datasource.write.partitionpath.field" -> "dt",
"hoodie.index.type" -> "BLOOM",
"hoodie.table.keygenerator.class"->
"org.apache.hudi.keygen.SimpleKeyGenerator",
"hoodie.datasource.write.drop.partition.columns"->"true"
) df1.write.format("hudi")
.options(hudiOptions)
.mode(Append)
.save(basePath)
val viewDF = spark
.read
.format("org.apache.hudi")
.load(basePath) viewDF.createOrReplaceTempView(tableName)
spark.sql(s"select * from $tableName where dt='2022-01-01'").show() {code}
the query works well.
# update record and query
{code:java}
val df2 = Seq(
("100", "10010", "2022-01-01"),
("200", "10020", "2022-01-01"),
("300", "10030", "2022-01-01"),
("400", "10040", "2022-01-02"),
("500", "10050", "2022-01-02"),
("600", "10060", "2022-01-02")
).toDF("id", "name", "dt") df2.write.format("hudi")
.options(hudiOptions)
.mode(Append)
.save(basePath)
val viewDF2 = spark
.read
.format("org.apache.hudi")
.load(basePath) viewDF2.createOrReplaceTempView(tableName)
spark.sql(s"select * from $tableName where dt='2022-01-01'").show() {code}
returns 0 record.
> Add partition value in HoodieLogRecordReader when
> hoodie.datasource.write.drop.partition.columns=true
> -----------------------------------------------------------------------------------------------------
>
> Key: HUDI-5047
> URL: https://issues.apache.org/jira/browse/HUDI-5047
> Project: Apache Hudi
> Issue Type: Bug
> Reporter: xi chaomin
> Priority: Major
> Labels: pull-request-available
>
> When I sync hive with hoodie.datasource.write.drop.partition.columns=false,
> the query with partition in where clause returns empty. eg: "select * from
> mor_table where partition=$partition".
> So I set hive with hoodie.datasource.write.drop.partition.columns = true.
> If hoodie.datasource.write.drop.partition.columns = true, the update record
> cann't be read in mor table.
> Steps to reproduce:
> # write data and query
> {code:java}
> val df1 = Seq(
> ("100", "1001", "2022-01-01"),
> ("200", "1002", "2022-01-01"),
> ("300", "1003", "2022-01-01"),
> ("400", "1004", "2022-01-02"),
> ("500", "1005", "2022-01-02"),
> ("600", "1006", "2022-01-02")
> ).toDF("id", "name", "dt") val hudiOptions = Map(
> "hoodie.table.name" -> tableName,
> "hoodie.datasource.write.table.type" -> "MERGE_ON_READ",
> "hoodie.datasource.write.operation" -> "upsert",
> "hoodie.datasource.write.recordkey.field" -> "id",
> "hoodie.datasource.write.precombine.field" -> "name",
> "hoodie.datasource.write.partitionpath.field" -> "dt",
> "hoodie.index.type" -> "BLOOM",
> "hoodie.table.keygenerator.class"->
> "org.apache.hudi.keygen.SimpleKeyGenerator",
> "hoodie.datasource.write.drop.partition.columns"->"true"
> ) df1.write.format("hudi")
> .options(hudiOptions)
> .mode(Append)
> .save(basePath)
> val viewDF = spark
> .read
> .format("org.apache.hudi")
> .load(basePath) viewDF.createOrReplaceTempView(tableName)
> spark.sql(s"select * from $tableName where dt='2022-01-01'").show() {code}
> the query works well.
> # update record and query
> {code:java}
> val df2 = Seq(
> ("100", "10010", "2022-01-01"),
> ("200", "10020", "2022-01-01"),
> ("300", "10030", "2022-01-01"),
> ("400", "10040", "2022-01-02"),
> ("500", "10050", "2022-01-02"),
> ("600", "10060", "2022-01-02")
> ).toDF("id", "name", "dt") df2.write.format("hudi")
> .options(hudiOptions)
> .mode(Append)
> .save(basePath)
> val viewDF2 = spark
> .read
> .format("org.apache.hudi")
> .load(basePath) viewDF2.createOrReplaceTempView(tableName)
> spark.sql(s"select * from $tableName where dt='2022-01-01'").show() {code}
> returns 0 record.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)