[ 
https://issues.apache.org/jira/browse/HUDI-1181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenning Ding updated HUDI-1181:
-------------------------------
    Description: 
When using *fixed_len_byte_array* decimal type as Hudi record key, Hudi would 
not correctly display the decimal value, instead, Hudi would display it as a 
byte array.

During the Hudi writing phase, Hudi would save the parquet source data into 
Avro Generic Record. For example, the source parquet data has a column with 
decimal type:

 
{code:java}
optional fixed_len_byte_array(16) OBJ_ID (DECIMAL(38,0));{code}
 

Then Hudi will convert it into the following avro decimal type:
{code:java}
{
    "name" : "LN_LQDN_OBJ_ID",
    "type" : [ {
      "type" : "fixed",
      "name" : "fixed",
      "namespace" : "hoodie.hudi_ln.hudi_ln_record.OBJ_ID",
      "size" : 16,
      "logicalType" : "decimal",
      "precision" : 38,
      "scale" : 0
    }, "null" ]
}
{code}
This decimal field would be stored as a fixed length bytes array. And in the 
reading phase, Hudi will convert this bytes array back to a readable decimal 
value through this converter.

However, the problem is, when setting decimal type as record keys, Hudi would 
read the value from Avro Generic Record and then directly convert it into 
String type(See 
[here|https://github.com/apache/hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/DataSourceUtils.java#L76]).

As a result, what shows in the _hoodie_record_key field would be something 
like: LN_LQDN_OBJ_ID:[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25, 40, 95, -71].So 
we need to handle this special case to convert bytes array back before 
converting to String.

  was:
When using ```fixed_len_byte_array``` decimal type as Hudi record key, Hudi 
would not correctly display the decimal value, instead, Hudi would display it 
as a byte array.

During the Hudi writing phase, Hudi would save the parquet source data into 
Avro Generic Record. For example, the source parquet data has a column with 
decimal type:

{

optional fixed_len_byte_array(16) LN_LQDN_OBJ_ID (DECIMAL(38,0));
 }
 Then Hudi will convert it into the following avro decimal type:


> Decimal type display issue for record key field
> -----------------------------------------------
>
>                 Key: HUDI-1181
>                 URL: https://issues.apache.org/jira/browse/HUDI-1181
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: Wenning Ding
>            Priority: Major
>
> When using *fixed_len_byte_array* decimal type as Hudi record key, Hudi would 
> not correctly display the decimal value, instead, Hudi would display it as a 
> byte array.
> During the Hudi writing phase, Hudi would save the parquet source data into 
> Avro Generic Record. For example, the source parquet data has a column with 
> decimal type:
>  
> {code:java}
> optional fixed_len_byte_array(16) OBJ_ID (DECIMAL(38,0));{code}
>  
> Then Hudi will convert it into the following avro decimal type:
> {code:java}
> {
>     "name" : "LN_LQDN_OBJ_ID",
>     "type" : [ {
>       "type" : "fixed",
>       "name" : "fixed",
>       "namespace" : "hoodie.hudi_ln.hudi_ln_record.OBJ_ID",
>       "size" : 16,
>       "logicalType" : "decimal",
>       "precision" : 38,
>       "scale" : 0
>     }, "null" ]
> }
> {code}
> This decimal field would be stored as a fixed length bytes array. And in the 
> reading phase, Hudi will convert this bytes array back to a readable decimal 
> value through this converter.
> However, the problem is, when setting decimal type as record keys, Hudi would 
> read the value from Avro Generic Record and then directly convert it into 
> String type(See 
> [here|https://github.com/apache/hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/DataSourceUtils.java#L76]).
> As a result, what shows in the _hoodie_record_key field would be something 
> like: LN_LQDN_OBJ_ID:[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25, 40, 95, -71].So 
> we need to handle this special case to convert bytes array back before 
> converting to String.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to