[
https://issues.apache.org/jira/browse/HIVE-18137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16264656#comment-16264656
]
Ashutosh Chauhan commented on HIVE-18137:
-----------------------------------------
Following rules suppose to be followed for schema evolution.
* Partitions when they are created get their schema as current table schema.
* There is no way to alter partition schema.
* Except via {{cascade}} which is suppose to alter schema of all partitions so
that they get same schema as current table schema.
* At query time, data is read per schema of table. Partitions will be read with
their own schema and then coerced into table schema.
Keeping in mind above, in your example since partition schema is not altered,
partition will be read per its old schema which means even if you insert new
columns, since partition schema doesn't know about it, new columns will be
ignored while reading partition. But since table schema contains it, we will
add NULL for it after partition has been read and while coercing it to match
table schema. So, current behavior will be considered correct.
On the other hand if you have altered table schema using {{cascade}} then
existing partition schema will also be updated and then partition will be read
per this new schema so new column will be read and result set will be as per
your second result set with one row with null and other with 3333.
Now this is how it *suppose* to work but since we have different code paths for
self describing file formats like orc vs others like text if you get different
behavior in some corner cases that will be considered bug.
> Schema evolution: newly inserted column value in pre-existing partition is
> masked to null
> -----------------------------------------------------------------------------------------
>
> Key: HIVE-18137
> URL: https://issues.apache.org/jira/browse/HIVE-18137
> Project: Hive
> Issue Type: Bug
> Reporter: Zoltan Haindrich
>
> {code}
> set hive.explain.user=false;
> set hive.fetch.task.conversion=none;
> set hive.mapred.mode=nonstrict;
> set hive.cli.print.header=true;
> SET hive.exec.schema.evolution=true;
> SET hive.vectorized.use.vectorized.input.format=true;
> SET hive.vectorized.use.vector.serde.deserialize=false;
> SET hive.vectorized.use.row.serde.deserialize=false;
> SET hive.vectorized.execution.enabled=false;
> set hive.exec.dynamic.partition.mode=nonstrict;
> set hive.metastore.disallow.incompatible.col.type.changes=true;
> set hive.default.fileformat=textfile;
> set hive.llap.io.enabled=false;
> CREATE TABLE part_add_int_permute_select(insert_num int, a INT, b STRING)
> PARTITIONED BY(part INT);
> insert into table part_add_int_permute_select partition(part=1) VALUES (1,
> 1111, 'new');
> alter table part_add_int_permute_select add columns(c int);
> insert into table part_add_int_permute_select partition(part=1) VALUES (2,
> 2222, 'new', 3333);
> select insert_num,part,a,b,c from part_add_int_permute_select;
> {code}
> results for the last select:
> {code}
> 1 1 1111 new NULL
> 2 1 2222 new NULL
> {code}
> I think the following result should be expected:
> {code}
> 1 1 1111 new NULL
> 2 1 2222 new 3333
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)