Csaba Ringhofer created IMPALA-12927:
----------------------------------------
Summary: Support reading BINARY columns in JSON tables
Key: IMPALA-12927
URL: https://issues.apache.org/jira/browse/IMPALA-12927
Project: IMPALA
Issue Type: Sub-task
Components: Backend
Reporter: Csaba Ringhofer
Currently Impala cannot read BINARY columns in JSON files written by Hive
correctly and returns runtime errors:
{code}
select * from functional_json.binary_tbl;
+----+--------------+------------+
| id | string_col | binary_col |
+----+--------------+------------+
| 1 | ascii | NULL |
| 2 | ascii | NULL |
| 3 | null | NULL |
| 4 | empty | |
| 5 | valid utf8 | NULL |
| 6 | valid utf8 | NULL |
| 7 | invalid utf8 | NULL |
| 8 | invalid utf8 | NULL |
+----+--------------+------------+
WARNINGS: Error converting column: functional_json.binary_tbl.binary_col, type:
STRING, data: 'binary1'
Error parsing row: file:
hdfs://localhost:20500/test-warehouse/binary_tbl_json/000000_0, before offset:
481
Error converting column: functional_json.binary_tbl.binary_col, type: STRING,
data: 'binary2'
Error parsing row: file:
hdfs://localhost:20500/test-warehouse/binary_tbl_json/000000_0, before offset:
481
Error converting column: functional_json.binary_tbl.binary_col, type: STRING,
data: 'árvíztűrőtükörfúró'
Error parsing row: file:
hdfs://localhost:20500/test-warehouse/binary_tbl_json/000000_0, before offset:
481
Error converting column: functional_json.binary_tbl.binary_col, type: STRING,
data: '你好hello'
Error parsing row: file:
hdfs://localhost:20500/test-warehouse/binary_tbl_json/000000_0, before offset:
481
Error converting column: functional_json.binary_tbl.binary_col, type: STRING,
data: '��'
Error parsing row: file:
hdfs://localhost:20500/test-warehouse/binary_tbl_json/000000_0, before offset:
481
Error converting column: functional_json.binary_tbl.binary_col, type: STRING,
data: '�D3"'
Error parsing row: file:
hdfs://localhost:20500/test-warehouse/binary_tbl_json/000000_0, before offset:
481
{code}
The single file in the table looks like this:
{code}
hdfs://localhost:20500/test-warehouse/binary_tbl_json/000000_0
{"id":1,"string_col":"ascii","binary_col":"binary1"}
{"id":2,"string_col":"ascii","binary_col":"binary2"}
{"id":3,"string_col":"null","binary_col":null}
{"id":4,"string_col":"empty","binary_col":""}
{"id":5,"string_col":"valid utf8","binary_col":"árvíztűrőtükörfúró"}
{"id":6,"string_col":"valid utf8","binary_col":"你好hello"}
{"id":7,"string_col":"invalid utf8","binary_col":"\u0000�\u0000�"}
{"id":8,"string_col":"invalid utf8","binary_col":"�D3\"\u0011\u0000"}
{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)