Paramvir Singh created HIVE-28728:
-------------------------------------
Summary: In INSERT OVERWRITE queries, STR_TO_MAP() UDF is not
using UTF-8 encoding properly resulting in garbled characters
Key: HIVE-28728
URL: https://issues.apache.org/jira/browse/HIVE-28728
Project: Hive
Issue Type: Bug
Components: Vectorization
Affects Versions: 4.0.1, 4.0.0
Reporter: Paramvir Singh
Assignee: Paramvir Singh
Chinese characters turn to garbled characters on using INSERT OVERWRITE query
and using STR_TO_MAP() function
Repro steps:
1. Text data file
{{{{100 hive
200 spark
300 oozie
400 airflow
500 优惠活动}} }}
{{2. Create table on top of it}}
CREATE external TABLE t1(
id string,
name string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ' '
STORED AS TEXTFILE
LOCATION 's3://prmsingh-hive/garbled/rawdata/';
{{ insert into table test1 values ('2010-01-01', '优惠活动');}}
{{3.}}
{{{{select STR_TO_MAP(concat(id,":",name),',',':') from t7;}}}}
{{{{OK
\{"100":"hive"}
\{"200":"spark"}
\{"300":"oozie"}
\{"400":"airflow"}
\{"500":"优惠活动"}}}}}
{{4.}}
{{}}
{{{{create external table result3
(cd MAP<STRING, STRING>)
location 's3://prmsingh-hive/garbled/result3/';}}}}
{{{{insert overwrite table result3 select
STR_TO_MAP(concat(id,":",name),',',':') from t7;}}}}
{{{{hive> select * from result3;}}}}
{{{{OK
\{"100":"hive"}
\{"200":"spark"}
\{"300":"oozie"}
\{"400":"airflow"}
\{"500":"????"}}}}}
But when I create the table and insert the data when vectorization is disabled.
Then the result is fine
--
This message was sent by Atlassian Jira
(v8.20.10#820010)