Alan Jackoway created IMPALA-7056:
-------------------------------------
Summary: Changing Text Delimiter Does Not Work
Key: IMPALA-7056
URL: https://issues.apache.org/jira/browse/IMPALA-7056
Project: IMPALA
Issue Type: Bug
Components: Catalog, Docs
Affects Versions: Impala 2.12.0
Reporter: Alan Jackoway
The wording on
https://impala.apache.org/docs/build/html/topics/impala_alter_table.html makes
it seem like you can change the delimiter of text tables after they are created.
I did the following to simulate a table that needed to switch between comma and
pipe delimited:
{code}
hadoop fs -mkdir /user/alanj
hadoop fs -mkdir /user/alanj/test_delim
echo "A,B|C" > delim.txt
hadoop fs -put delim.txt /user/alanj/test_delim
{code}
Then created in impala and tried to change delimiters:
{code:sql}
> create external table default.alanj_test_delim(A string, B string) ROW FORMAT
> DELIMITED FIELDS TERMINATED BY "," LOCATION '/user/alanj/test_delim';
> select * from default.alanj_test_delim;
Query: select * from default.alanj_test_delim
+---+-----+
| a | b |
+---+-----+
| A | B|C |
+---+-----+
> alter table default.alanj_test_delim set SERDEPROPERTIES
> ('serialization.format'='|', 'field.delim'='|');
> select * from default.alanj_test_delim;
+---+-----+
| a | b |
+---+-----+
| A | B|C |
+---+-----+
> show create table default.alanj_test_delim;
+----------------------------------------------------------------------------------------------------------------------+
| result
|
+----------------------------------------------------------------------------------------------------------------------+
| CREATE EXTERNAL TABLE default.alanj_test_delim (
|
| a STRING,
|
| b STRING
|
| )
|
| ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
|
| WITH SERDEPROPERTIES ('field.delim'='|', 'serialization.format'='|')
|
| STORED AS TEXTFILE
|
| LOCATION 'hdfs://namenode:8020/user/alanj/test_delim'
|
| TBLPROPERTIES ('COLUMN_STATS_ACCURATE'='false', 'numFiles'='0',
'numRows'='-1', 'rawDataSize'='-1', 'totalSize'='0') |
+----------------------------------------------------------------------------------------------------------------------+
{code}
So it shows the right serdeproperties, but impala doesn't actually use them to
read the data.
If you then insert data (as the docs suggest), it writes that data with the new
delimiter:
{code:sql}
> insert into default.alanj_test_delim values('D', 'E,F');
> select * from alanj_test_delim;
+-----+-----+
| a | b |
+-----+-----+
| A,B | C |
| D | E,F |
+-----+-----+
# hadoop fs -cat
/user/alanj/test_delim/a54bb0ec14646492-a738811400000000_1498283208_data.0.
D|E,F
{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)