[
https://issues.apache.org/jira/browse/HUDI-3818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
rex xiong updated HUDI-3818:
----------------------------
Description:
when use bytes column as primary key, hudi will generate fixed hoodie key,
then upserts will only insert one row.
{code:java}
scala> sql("desc extended binary_test1").show()
+--------------------+--------------------+-------+
| col_name| data_type|comment|
+--------------------+--------------------+-------+
| _hoodie_commit_time| string| null|
|_hoodie_commit_seqno| string| null|
| _hoodie_record_key| string| null|
|_hoodie_partition...| string| null|
| _hoodie_file_name| string| null|
| id| binary| null|
| name| string| null|
| dt| string| null|
| | | |
|# Detailed Table ...| | |
| Database| default| |
| Table| binary_test1| |
| Owner| root| |
| Created Time|Sat Apr 02 13:28:...| |
| Last Access| UNKNOWN| |
| Created By| Spark 3.2.0| |
| Type| MANAGED| |
| Provider| hudi| |
| Table Properties|[last_commit_time...| |
| Statistics| 435194 bytes| |
+--------------------+--------------------+-------+
scala> sql("select * from binary_test1").show()
+-------------------+--------------------+--------------------+----------------------+--------------------+--------------------+---------+--------+
|_hoodie_commit_time|_hoodie_commit_seqno|
_hoodie_record_key|_hoodie_partition_path| _hoodie_file_name|
id| name| dt|
+-------------------+--------------------+--------------------+----------------------+--------------------+--------------------+---------+--------+
| 20220402132927590|20220402132927590...|id:java.nio.HeapB...|
|1a06106e-5e7a-4e6...|[03 45 6A 00 00 0...|Mary Jane|20220401|
+-------------------+--------------------+--------------------+----------------------+--------------------+--------------------+---------+--------+{code}
was:
{code:java}
scala> sql("desc extended binary_test1").show(false)
+----------------------------+--------------------------------------------------------------------------------------+-------+
|col_name |data_type
|comment|
+----------------------------+--------------------------------------------------------------------------------------+-------+
|_hoodie_commit_time |string
|null |
|_hoodie_commit_seqno |string
|null |
|_hoodie_record_key |string
|null |
|_hoodie_partition_path |string
|null |
|_hoodie_file_name |string
|null |
|id |binary
|null |
|name |string
|null |
|dt |string
|null |
| |
| |
|# Detailed Table Information|
| |
|Database |default
| |
|Table |binary_test1
| |
|Owner |root
| |
|Created Time |Sat Apr 02 13:28:29 CST 2022
| |
|Last Access |UNKNOWN
| |
|Created By |Spark 3.2.0
| |
|Type |MANAGED
| |
|Provider |hudi
| |
|Table Properties |[last_commit_time_sync=20220402132927590,
preCombineField=id, primaryKey=id, type=cow]| |
|Statistics |435194 bytes
| |
+----------------------------+--------------------------------------------------------------------------------------+-------+
only showing top 20 rows
scala> sql("select * from binary_test1").show(false)
+-------------------+---------------------+-----------------------------------------------+----------------------+--------------------------------------------------------------------------+-------------------------------------------------+---------+--------+
|_hoodie_commit_time|_hoodie_commit_seqno |_hoodie_record_key
|_hoodie_partition_path|_hoodie_file_name
|id
|name |dt |
+-------------------+---------------------+-----------------------------------------------+----------------------+--------------------------------------------------------------------------+-------------------------------------------------+---------+--------+
|20220402132927590 |20220402132927590_0_1|id:java.nio.HeapByteBuffer[pos=0
lim=16 cap=16]|
|1a06106e-5e7a-4e68-9ebb-a0dceab70d87-0_0-12-1005_20220402132927590.parquet|[03
45 6A 00 00 00 00 00 00 00 00 00 00 00 00 00]|Mary Jane|20220401|
+-------------------+---------------------+-----------------------------------------------+----------------------+--------------------------------------------------------------------------+-------------------------------------------------+---------+--------+
{code}
> hudi doesn't support bytes column as primary key
> ------------------------------------------------
>
> Key: HUDI-3818
> URL: https://issues.apache.org/jira/browse/HUDI-3818
> Project: Apache Hudi
> Issue Type: Bug
> Reporter: rex xiong
> Assignee: rex xiong
> Priority: Minor
>
> when use bytes column as primary key, hudi will generate fixed hoodie key,
> then upserts will only insert one row.
> {code:java}
> scala> sql("desc extended binary_test1").show()
> +--------------------+--------------------+-------+
> | col_name| data_type|comment|
> +--------------------+--------------------+-------+
> | _hoodie_commit_time| string| null|
> |_hoodie_commit_seqno| string| null|
> | _hoodie_record_key| string| null|
> |_hoodie_partition...| string| null|
> | _hoodie_file_name| string| null|
> | id| binary| null|
> | name| string| null|
> | dt| string| null|
> | | | |
> |# Detailed Table ...| | |
> | Database| default| |
> | Table| binary_test1| |
> | Owner| root| |
> | Created Time|Sat Apr 02 13:28:...| |
> | Last Access| UNKNOWN| |
> | Created By| Spark 3.2.0| |
> | Type| MANAGED| |
> | Provider| hudi| |
> | Table Properties|[last_commit_time...| |
> | Statistics| 435194 bytes| |
> +--------------------+--------------------+-------+
> scala> sql("select * from binary_test1").show()
> +-------------------+--------------------+--------------------+----------------------+--------------------+--------------------+---------+--------+
> |_hoodie_commit_time|_hoodie_commit_seqno|
> _hoodie_record_key|_hoodie_partition_path| _hoodie_file_name|
> id| name| dt|
> +-------------------+--------------------+--------------------+----------------------+--------------------+--------------------+---------+--------+
> | 20220402132927590|20220402132927590...|id:java.nio.HeapB...|
> |1a06106e-5e7a-4e6...|[03 45 6A 00 00 0...|Mary Jane|20220401|
> +-------------------+--------------------+--------------------+----------------------+--------------------+--------------------+---------+--------+{code}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)