[jira] [Updated] (HIVE-21428) field delimiter property set at partition level is not getting respected when schema evolution/vectorized execution is enabled

Ganesha Shreedhara (JIRA) Sun, 31 Mar 2019 20:11:28 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-21428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ganesha Shreedhara updated HIVE-21428:
--------------------------------------
    Summary: field delimiter property set at partition level is not getting 
respected when schema evolution/vectorized execution is enabled  (was: field 
delimiter property set at partition level is not getting respected when schema 
evolution is enabled)

> field delimiter property set at partition level is not getting respected when 
> schema evolution/vectorized execution is enabled
> ------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-21428
>                 URL: https://issues.apache.org/jira/browse/HIVE-21428
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 3.1.1
>            Reporter: Ganesha Shreedhara
>            Priority: Major
>
> *Steps to reproduce:*
> – create a partitioned table 
> {code:java}
> create external table src (c1 string, c2 string, c3 string) partitioned by 
> (part string)
> location '/tmp/src';
> {code}
>  
> Create data file with data present only in 2 columns and separated by tab, 
> put it in table's external location 
> {code:java}
> echo "d1\td2"  >> data.txt;
> hadoop dfs -put  data.txt /tmp/src/part=part1/; 
> {code}
>  
> – Recover data
> {code:java}
> MSCK REPAIR TABLE src;{code}
>  
> – Alter partition's property to have field delimiter as tab ('\t')
> {code:java}
> ALTER TABLE src PARTITION (part='part1')
> SET SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> WITH SERDEPROPERTIES ('columns'='c1,c2', 'column.types' ='string,string', 
> 'field.delim'='\t'); 
> {code}
>  
> – Now write the data from src table to a dest table
> {code:java}
> create table dest (c1 string, c2 string, c3 string, c4 string);
> insert overwrite table dest select * from src; 
> {code}
>  
> – Retrieve data from dest table
> {code:java}
> select * from dest; {code}
>  
> *Result* (wrong)*:*
> d1 d2 NULL NULL part1
>  
> – Now disable schema evolution, write data again from src table to dest table 
> and retrieve the data 
> {code:java}
> set hive.exec.schema.evolution=false;
> insert overwrite table dest select * from src;
> select * from dest;
> {code}
>  
> *Result* (Correct)*:*
> d1 d2 NULL part1
>  
> This is because "d1\td2" is getting considered as single column because the 
> filed delimiter used by deserialiser is  *^A* instead of *\t* which is set at 
> partition level.
> It is working fine if I alter the field delimiter of serde for the entire 
> table.
> So, looks like serde properties in TableDesc is taking precedence over serde 
> properties in PartitionDesc.  This issue is only when 
> hive.exec.schema.evolution is enabled (enabled by default) and its not there 
> in 2.x versions. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21428) field delimiter property set at partition level is not getting respected when schema evolution/vectorized execution is enabled

Reply via email to