[
https://issues.apache.org/jira/browse/HUDI-1181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wenning Ding updated HUDI-1181:
-------------------------------
Description:
When using *fixed_len_byte_array* decimal type as Hudi record key, Hudi would
not correctly display the decimal value, instead, Hudi would display it as a
byte array.
During the Hudi writing phase, Hudi would save the parquet source data into
Avro Generic Record. For example, the source parquet data has a column with
decimal type:
{code:java}
optional fixed_len_byte_array(16) OBJ_ID (DECIMAL(38,0));{code}
Then Hudi will convert it into the following avro decimal type:
{code:java}
{
"name" : "OBJ_ID",
"type" : [ {
"type" : "fixed",
"name" : "fixed",
"namespace" : "hoodie.hudi_ln.hudi_ln_record.OBJ_ID",
"size" : 16,
"logicalType" : "decimal",
"precision" : 38,
"scale" : 0
}, "null" ]
}
{code}
This decimal field would be stored as a fixed length bytes array. And in the
reading phase, Hudi will convert this bytes array back to a readable decimal
value through this converter.
However, the problem is, when setting decimal type as record keys, Hudi would
read the value from Avro Generic Record and then directly convert it into
String type(See
[here|https://github.com/apache/hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/DataSourceUtils.java#L76]).
As a result, what shows in the _hoodie_record_key field would be something
like: LN_LQDN_OBJ_ID:[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25, 40, 95, -71].So
we need to handle this special case to convert bytes array back before
converting to String.
was:
When using *fixed_len_byte_array* decimal type as Hudi record key, Hudi would
not correctly display the decimal value, instead, Hudi would display it as a
byte array.
During the Hudi writing phase, Hudi would save the parquet source data into
Avro Generic Record. For example, the source parquet data has a column with
decimal type:
{code:java}
optional fixed_len_byte_array(16) OBJ_ID (DECIMAL(38,0));{code}
Then Hudi will convert it into the following avro decimal type:
{code:java}
{
"name" : "LN_LQDN_OBJ_ID",
"type" : [ {
"type" : "fixed",
"name" : "fixed",
"namespace" : "hoodie.hudi_ln.hudi_ln_record.OBJ_ID",
"size" : 16,
"logicalType" : "decimal",
"precision" : 38,
"scale" : 0
}, "null" ]
}
{code}
This decimal field would be stored as a fixed length bytes array. And in the
reading phase, Hudi will convert this bytes array back to a readable decimal
value through this converter.
However, the problem is, when setting decimal type as record keys, Hudi would
read the value from Avro Generic Record and then directly convert it into
String type(See
[here|https://github.com/apache/hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/DataSourceUtils.java#L76]).
As a result, what shows in the _hoodie_record_key field would be something
like: LN_LQDN_OBJ_ID:[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25, 40, 95, -71].So
we need to handle this special case to convert bytes array back before
converting to String.
> Decimal type display issue for record key field
> -----------------------------------------------
>
> Key: HUDI-1181
> URL: https://issues.apache.org/jira/browse/HUDI-1181
> Project: Apache Hudi
> Issue Type: Bug
> Reporter: Wenning Ding
> Priority: Major
>
> When using *fixed_len_byte_array* decimal type as Hudi record key, Hudi would
> not correctly display the decimal value, instead, Hudi would display it as a
> byte array.
> During the Hudi writing phase, Hudi would save the parquet source data into
> Avro Generic Record. For example, the source parquet data has a column with
> decimal type:
>
> {code:java}
> optional fixed_len_byte_array(16) OBJ_ID (DECIMAL(38,0));{code}
>
> Then Hudi will convert it into the following avro decimal type:
> {code:java}
> {
> "name" : "OBJ_ID",
> "type" : [ {
> "type" : "fixed",
> "name" : "fixed",
> "namespace" : "hoodie.hudi_ln.hudi_ln_record.OBJ_ID",
> "size" : 16,
> "logicalType" : "decimal",
> "precision" : 38,
> "scale" : 0
> }, "null" ]
> }
> {code}
> This decimal field would be stored as a fixed length bytes array. And in the
> reading phase, Hudi will convert this bytes array back to a readable decimal
> value through this converter.
> However, the problem is, when setting decimal type as record keys, Hudi would
> read the value from Avro Generic Record and then directly convert it into
> String type(See
> [here|https://github.com/apache/hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/DataSourceUtils.java#L76]).
> As a result, what shows in the _hoodie_record_key field would be something
> like: LN_LQDN_OBJ_ID:[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25, 40, 95, -71].So
> we need to handle this special case to convert bytes array back before
> converting to String.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)