[ 
https://issues.apache.org/jira/browse/HIVE-21428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ganesha Shreedhara updated HIVE-21428:
--------------------------------------
    Description: 
 

 

*Steps to reproduce:*

create external table src (c1 string, c2, string, c3 string) partitioned by 
(part string)

location '/tmp/src';

 

 

echo "d1\td2"  >> data.txt;

hadoop dfs -put  data.txt /tmp/src/part=part1/;

 

MSCK REPAIR TABLE src;

 

ALTER TABLE src PARTITION (part='part1')

SET SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'

WITH SERDEPROPERTIES ('columns'='c1,c2', 'column.types' ='string,string', 
'field.delim'='\t');

 

create table dest (c1 string, c2 string, c3 string, c4 string);

insert overwrite table dest select * from src;

select * from dest;

 

*Result* (wrong)*:*

d1 d2 NULL NULL part1

 

set hive.vectorized.execution.enabled=false;

insert overwrite table dest select * from src;

select * from dest;

 

*Result* (Correct)*:*

d1 d2 NULL part1

 

This is because "d1\td2" is getting considered as single column because the 
filed delimiter used by deserialiser is  *^A* instead of *\t* which is set at 
partition level.

It is working fine if I alter the field delimiter of serde for the entire table.

So, looks like serde properties in TableDesc is taking precedence over serde 
properties in PartitionDesc.  This issue is only when 
hive.exec.schema.evolution is enabled and its not there in 2.x versions. 

 

 

 

 

 

 

  was:
 

*Steps to reproduce:*

create external table src (c1 string, c2, string, c3 string) partitioned by 
(part string)

location '/tmp/src';

 

 

echo "d1\td2"  >> data.txt;

hadoop dfs -put  data.txt /tmp/src/part=part1/;

 

MSCK REPAIR TABLE src;

 

ALTER TABLE src PARTITION (part='part1')

SET SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'

WITH SERDEPROPERTIES ('columns'='c1,c2', 'column.types' ='string,string', 
'field.delim'='\t');

 

create table dest (c1 string, c2 string, c3 string, c4 string);

insert overwrite table dest select * from src;

select * from dest;

 

*Result* (wrong)*:*

d1 d2 NULL NULL part1

 

set hive.vectorized.execution.enabled=false;

insert overwrite table dest select * from src;

select * from dest;

 

*Result* (Correct)*:*

d1 d2 NULL part1

 

This is because "d1\td2" is getting considered as single column because the 
filed delimiter used by deserialiser is  *^A* instead of *\t* which is set at 
partition level.

It is working fine if I alter the field delimiter of serde for the entire table.

So, looks like serde properties in TableDesc is taking precedence over serde 
properties in PartitionDesc. 

This issue is not there in 2.x versions. 

 

 

 

 

 

 


> field delimiter property set at partition level is not getting respected when 
> schema evolution is enabled
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-21428
>                 URL: https://issues.apache.org/jira/browse/HIVE-21428
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 3.1.1
>            Reporter: Ganesha Shreedhara
>            Priority: Major
>
>  
>  
> *Steps to reproduce:*
> create external table src (c1 string, c2, string, c3 string) partitioned by 
> (part string)
> location '/tmp/src';
>  
>  
> echo "d1\td2"  >> data.txt;
> hadoop dfs -put  data.txt /tmp/src/part=part1/;
>  
> MSCK REPAIR TABLE src;
>  
> ALTER TABLE src PARTITION (part='part1')
> SET SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> WITH SERDEPROPERTIES ('columns'='c1,c2', 'column.types' ='string,string', 
> 'field.delim'='\t');
>  
> create table dest (c1 string, c2 string, c3 string, c4 string);
> insert overwrite table dest select * from src;
> select * from dest;
>  
> *Result* (wrong)*:*
> d1 d2 NULL NULL part1
>  
> set hive.vectorized.execution.enabled=false;
> insert overwrite table dest select * from src;
> select * from dest;
>  
> *Result* (Correct)*:*
> d1 d2 NULL part1
>  
> This is because "d1\td2" is getting considered as single column because the 
> filed delimiter used by deserialiser is  *^A* instead of *\t* which is set at 
> partition level.
> It is working fine if I alter the field delimiter of serde for the entire 
> table.
> So, looks like serde properties in TableDesc is taking precedence over serde 
> properties in PartitionDesc.  This issue is only when 
> hive.exec.schema.evolution is enabled and its not there in 2.x versions. 
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to