[
https://issues.apache.org/jira/browse/HUDI-8928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17921709#comment-17921709
]
Sagar Sumit commented on HUDI-8928:
-----------------------------------
Ran
[TestSparkSqlWithCustomKeyGenerator|https://github.com/apache/hudi/blob/02472c91aac1892d76602795c3f816b58e9c90f7/hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestSparkSqlWithCustomKeyGenerator.scala#L255]
- could not repro.
{code:java}
val df = spark.sql(
s"""SELECT 1 as id, 'a1' as name, 1.6 as price, 1704121827 as ts, 'cat1' as
segment
| UNION
| SELECT 2 as id, 'a2' as name, 10.8 as price, 1704121827 as ts, 'cat1' as
segment
| UNION
| SELECT 3 as id, 'a3' as name, 30.0 as price, 1706800227 as ts, 'cat1' as
segment
| UNION
| SELECT 4 as id, 'a4' as name, 103.4 as price, 1701443427 as ts, 'cat2'
as segment
| UNION
| SELECT 5 as id, 'a5' as name, 1999.0 as price, 1704121827 as ts, 'cat2'
as segment
| UNION
| SELECT 6 as id, 'a6' as name, 80.0 as price, 1704121827 as ts, 'cat3' as
segment
|""".stripMargin)
df.write.format("hudi")
.option("hoodie.datasource.write.table.type", tableType)
.option("hoodie.datasource.write.keygenerator.class",
"org.apache.hudi.keygen.CustomKeyGenerator")
.option("hoodie.datasource.write.partitionpath.field",
"ts:timestamp,segment:simple")
.option("hoodie.datasource.write.recordkey.field", "id")
.option("hoodie.datasource.write.precombine.field", "name")
.option("hoodie.table.name", tableName)
.option("hoodie.insert.shuffle.parallelism", "1")
.option("hoodie.upsert.shuffle.parallelism", "1")
.option("hoodie.bulkinsert.shuffle.parallelism", "1")
.option("hoodie.keygen.timebased.timestamp.type", "SCALAR")
.option("hoodie.keygen.timebased.output.dateformat", "yyyyMM")
.option("hoodie.keygen.timebased.timestamp.scalar.time.unit", "seconds")
.mode(SaveMode.Overwrite)
.save(tablePath)
spark.read.format("hudi").load(tablePath).show(false)
+-------------------+---------------------+------------------+----------------------+--------------------------------------------------------------------------+---+----+------+----------+-------+
|_hoodie_commit_time|_hoodie_commit_seqno
|_hoodie_record_key|_hoodie_partition_path|_hoodie_file_name
|id |name|price |ts |segment|
+-------------------+---------------------+------------------+----------------------+--------------------------------------------------------------------------+---+----+------+----------+-------+
|20250128120421899 |20250128120421899_0_0|2 |202401/cat1
|390ee34b-6466-46fb-99da-9c7010d87413-0_0-181-278_20250128120421899.parquet|2
|a2 |10.8 |1704121827|cat1 |
|20250128120421899 |20250128120421899_0_1|1 |202401/cat1
|390ee34b-6466-46fb-99da-9c7010d87413-0_0-181-278_20250128120421899.parquet|1
|a1 |1.6 |1704121827|cat1 |
|20250128120421899 |20250128120421899_2_0|6 |202401/cat3
|0b744252-3504-47f6-83b5-36a8ad1d9bbd-0_2-181-280_20250128120421899.parquet|6
|a6 |80.0 |1704121827|cat3 |
|20250128120421899 |20250128120421899_4_0|4 |202312/cat2
|99e01331-1443-4058-bca7-c35cd56b3c77-0_4-181-282_20250128120421899.parquet|4
|a4 |103.4 |1701443427|cat2 |
|20250128120421899 |20250128120421899_3_0|3 |202402/cat1
|3b33ac51-c756-4b04-af35-cb358f9ba80a-0_3-181-281_20250128120421899.parquet|3
|a3 |30.0 |1706800227|cat1 |
|20250128120421899 |20250128120421899_1_0|5 |202401/cat2
|3c8ec017-883d-4ba5-b550-437ee3cc2246-0_1-181-279_20250128120421899.parquet|5
|a5 |1999.0|1704121827|cat2 |
+-------------------+---------------------+------------------+----------------------+--------------------------------------------------------------------------+---+----+------+----------+-------+{code}
> Fix timestamp based partitioning and drop partition support with 0.15.0 and
> 1.0
> -------------------------------------------------------------------------------
>
> Key: HUDI-8928
> URL: https://issues.apache.org/jira/browse/HUDI-8928
> Project: Apache Hudi
> Issue Type: Sub-task
> Components: reader-core
> Reporter: sivabalan narayanan
> Assignee: Sagar Sumit
> Priority: Blocker
> Fix For: 1.0.1
>
> Attachments: Screenshot 2025-01-28 at 4.12.49 PM.png
>
>
> Based on our analysis, drop partition support is broken in 0.15.0 for multi
> partition fields.
>
> For nested field, it is swapping a field with the same name but different
> path with the partition value
> For timestamp issue, the field gets replaced with the partition value instead
> of the value in the file (for example:
> {{{}timestamp_micros_nullable_field":"2025-01-25T00:00:00.000Z"{}}})
> Also seeing a regression on drop partition where the dropped partition is
> still being read
> The replace commit is not being written correctly in 0.15.0, the
> {{partitionToReplaceFileIds}} contains a map with an empty list instead of
> the filegroup ids for the partition
>
> We need a fix for 0.15.0.
> 1.0 is yet to tried. not sure if its broken.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)