[
https://issues.apache.org/jira/browse/HIVE-21428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ganesha Shreedhara updated HIVE-21428:
--------------------------------------
Summary: field delimiter property set at partition level is not getting
respected when schema evolution/vectorized execution is enabled (was: field
delimiter property set at partition level is not getting respected when schema
evolution is enabled)
> field delimiter property set at partition level is not getting respected when
> schema evolution/vectorized execution is enabled
> ------------------------------------------------------------------------------------------------------------------------------
>
> Key: HIVE-21428
> URL: https://issues.apache.org/jira/browse/HIVE-21428
> Project: Hive
> Issue Type: Bug
> Affects Versions: 3.1.1
> Reporter: Ganesha Shreedhara
> Priority: Major
>
> *Steps to reproduce:*
> – create a partitioned table
> {code:java}
> create external table src (c1 string, c2 string, c3 string) partitioned by
> (part string)
> location '/tmp/src';
> {code}
>
> Create data file with data present only in 2 columns and separated by tab,
> put it in table's external location
> {code:java}
> echo "d1\td2" >> data.txt;
> hadoop dfs -put data.txt /tmp/src/part=part1/;
> {code}
>
> – Recover data
> {code:java}
> MSCK REPAIR TABLE src;{code}
>
> – Alter partition's property to have field delimiter as tab ('\t')
> {code:java}
> ALTER TABLE src PARTITION (part='part1')
> SET SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> WITH SERDEPROPERTIES ('columns'='c1,c2', 'column.types' ='string,string',
> 'field.delim'='\t');
> {code}
>
> – Now write the data from src table to a dest table
> {code:java}
> create table dest (c1 string, c2 string, c3 string, c4 string);
> insert overwrite table dest select * from src;
> {code}
>
> – Retrieve data from dest table
> {code:java}
> select * from dest; {code}
>
> *Result* (wrong)*:*
> d1 d2 NULL NULL part1
>
> – Now disable schema evolution, write data again from src table to dest table
> and retrieve the data
> {code:java}
> set hive.exec.schema.evolution=false;
> insert overwrite table dest select * from src;
> select * from dest;
> {code}
>
> *Result* (Correct)*:*
> d1 d2 NULL part1
>
> This is because "d1\td2" is getting considered as single column because the
> filed delimiter used by deserialiser is *^A* instead of *\t* which is set at
> partition level.
> It is working fine if I alter the field delimiter of serde for the entire
> table.
> So, looks like serde properties in TableDesc is taking precedence over serde
> properties in PartitionDesc. This issue is only when
> hive.exec.schema.evolution is enabled (enabled by default) and its not there
> in 2.x versions.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)