[
https://issues.apache.org/jira/browse/HIVE-28728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dmitriy Fingerman updated HIVE-28728:
-------------------------------------
Labels: correctness (was: )
> In INSERT OVERWRITE queries, STR_TO_MAP() UDF is not using UTF-8 encoding
> properly resulting in garbled characters
> ------------------------------------------------------------------------------------------------------------------
>
> Key: HIVE-28728
> URL: https://issues.apache.org/jira/browse/HIVE-28728
> Project: Hive
> Issue Type: Bug
> Components: Vectorization
> Affects Versions: 4.0.0, 4.0.1
> Reporter: Paramvir Singh
> Priority: Major
> Labels: correctness
>
> Chinese characters turn to garbled characters on using INSERT OVERWRITE query
> and using STR_TO_MAP() function
> Repro steps:
> 1. Text data file
> {code:java}
> 100 hive
> 200 spark
> 300 oozie
> 400 airflow
> 500 优惠活动
> {code}
> {{2. Create table on top of it}}
> {code:java}
> CREATE external TABLE t1(
> id string,
> name string)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ' '
> STORED AS TEXTFILE
> LOCATION 's3://prmsingh-hive/garbled/rawdata/';
> 3. Selecting the data from source table runs fine
> {code:java}
> select STR_TO_MAP(concat(id,":",name),',',':') from t1;
> OK
> {"100":"hive"}
> {"200":"spark"}
> {"300":"oozie"}
> {"400":"airflow"}
> {"500":"优惠活动"}
> {code}
> 4. But when you create another table and run IOW query to insert the data and
> use select query on the destination table, it returns garbled characters
> {code:java}
> create external table result3
> (cd MAP<STRING, STRING>)
> location 's3://prmsingh-hive/garbled/result3/';
> insert overwrite table result3 select STR_TO_MAP(concat(id,":",name),',',':')
> from t1;
> hive> select * from result3;
> OK
> {"100":"hive"}
> {"200":"spark"}
> {"300":"oozie"}
> {"400":"airflow"}
> {"500":"????"}
> {code}
>
> But when I create the table and insert the data while vectorization is
> disabled, then the result is fine
--
This message was sent by Atlassian Jira
(v8.20.10#820010)