[jira] [Commented] (HIVE-18137) Schema evolution: newly inserted column value in pre-existing partition is masked to null

Ashutosh Chauhan (JIRA) Thu, 23 Nov 2017 09:50:46 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-18137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16264656#comment-16264656
 ]


Ashutosh Chauhan commented on HIVE-18137:
-----------------------------------------

Following rules suppose to be followed for schema evolution. 
* Partitions when they are created get their schema as current table schema.
* There is no way to alter partition schema.
* Except via {{cascade}} which is suppose to alter schema of all partitions so 
that they get same schema as current table schema.
* At query time, data is read per schema of table. Partitions will be read with 
their own schema and then coerced into table schema.

Keeping in mind above, in your example since partition schema is not altered, 
partition will be read per its old schema which means even if you insert new 
columns, since partition schema doesn't know about it, new columns will be 
ignored while reading partition. But since table schema contains it, we will 
add NULL for it after partition has been read and while coercing it to match 
table schema. So, current behavior will be considered correct.
On the other hand if you have altered table schema using {{cascade}} then 
existing partition schema will also be updated and then partition will be read 
per this new schema so new column will be read and result set will be as per 
your second result set with one row with null and other with 3333.

Now this is how it *suppose* to work but since we have different code paths for 
self describing file formats like orc vs others like text if you get different 
behavior in some corner cases that will be considered bug.

> Schema evolution: newly inserted column value in pre-existing partition is 
> masked to null
> -----------------------------------------------------------------------------------------
>
>                 Key: HIVE-18137
>                 URL: https://issues.apache.org/jira/browse/HIVE-18137
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Zoltan Haindrich
>
> {code}
> set hive.explain.user=false;
> set hive.fetch.task.conversion=none;
> set hive.mapred.mode=nonstrict;
> set hive.cli.print.header=true;
> SET hive.exec.schema.evolution=true;
> SET hive.vectorized.use.vectorized.input.format=true;
> SET hive.vectorized.use.vector.serde.deserialize=false;
> SET hive.vectorized.use.row.serde.deserialize=false;
> SET hive.vectorized.execution.enabled=false;
> set hive.exec.dynamic.partition.mode=nonstrict;
> set hive.metastore.disallow.incompatible.col.type.changes=true;
> set hive.default.fileformat=textfile;
> set hive.llap.io.enabled=false;
> CREATE TABLE part_add_int_permute_select(insert_num int, a INT, b STRING) 
> PARTITIONED BY(part INT);
> insert into table part_add_int_permute_select partition(part=1) VALUES (1, 
> 1111, 'new');
> alter table part_add_int_permute_select add columns(c int);
> insert into table part_add_int_permute_select partition(part=1) VALUES (2, 
> 2222, 'new', 3333);
> select insert_num,part,a,b,c from part_add_int_permute_select;
> {code}
> results for the last select:
> {code}
> 1      1       1111    new     NULL
> 2      1       2222    new     NULL
> {code}
> I think the following result should be expected:
> {code}
> 1      1       1111    new     NULL
> 2      1       2222    new     3333
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-18137) Schema evolution: newly inserted column value in pre-existing partition is masked to null

Reply via email to