[
https://issues.apache.org/jira/browse/HIVE-13943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15941914#comment-15941914
]
Pengcheng Xiong commented on HIVE-13943:
----------------------------------------
I am deferring this to Hive 3.0 as we are going to cut the first RC and it is
not marked as blocker. Please feel free to commit to the branch if this can be
resolved before the release.
> Null is inserted into an existing partition after replacing a column of int
> with string group type of text formatted partitioned table
> ---------------------------------------------------------------------------------------------------------------------------------------
>
> Key: HIVE-13943
> URL: https://issues.apache.org/jira/browse/HIVE-13943
> Project: Hive
> Issue Type: Bug
> Affects Versions: 1.2.1, 2.1.0
> Reporter: Takahiko Saito
> Priority: Critical
>
> Create a text formatted table with a int column partitioned by a string
> column.
> After replacing the columns of int with string and inserting a new row with
> the existing partition ('horton' in this case), null is inserted as a value
> in the altered column:
> {noformat}
> 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> drop table if exists test;
> No rows affected (0.249 seconds)
> 0: jdbc:hive2://ts-0531-1.openstacklocal:2181>
> 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> create table test (i int)
> partitioned by (s string);
> No rows affected (0.116 seconds)
> 0: jdbc:hive2://ts-0531-1.openstacklocal:2181>
> 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> insert into table test
> partition (s) values(1, 'horton');
> INFO : Session is already open
> INFO : Dag name: insert into table test partition...'horton')(Stage-1)
> INFO :
> INFO : Status: Running (Executing on YARN cluster with App id
> application_1464727816747_0759)
> INFO : Map 1: 0/1
> INFO : Map 1: 0(+1)/1
> INFO : Map 1: 0(+1)/1
> INFO : Map 1: 1/1
> INFO : Loading data to table default.test partition (s=null) from
> hdfs://ts-0531-5.openstacklocal:8020/apps/hive/warehouse/test/.hive-staging_hive_2016-06-03_21-09-46_619_6747668726633461908-940/-ext-10000
> INFO : Time taken for load dynamic partitions : 122
> INFO : Loading partition {s=horton}
> INFO : Time taken for adding to write entity : 0
> INFO : Partition default.test{s=horton} stats: [numFiles=1, numRows=1,
> totalSize=2, rawDataSize=1]
> No rows affected (8.301 seconds)
> 0: jdbc:hive2://ts-0531-1.openstacklocal:2181>
> 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> select * from test;
> +---------+---------+--+
> | test.i | test.s |
> +---------+---------+--+
> | 1 | horton |
> +---------+---------+--+
> 1 row selected (0.184 seconds)
> 0: jdbc:hive2://ts-0531-1.openstacklocal:2181>
> 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> alter table test replace
> columns (i string);
> No rows affected (0.138 seconds)
> 0: jdbc:hive2://ts-0531-1.openstacklocal:2181>
> 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> insert into table test
> partition (s) values ('horton', 'horton');
> INFO : Session is already open
> INFO : Dag name: insert into table test partition...'horton')(Stage-1)
> INFO :
> INFO : Map 1: 1/1
> INFO : Loading data to table default.test partition (s=null) from
> hdfs://ts-0531-5.openstacklocal:8020/apps/hive/warehouse/test/.hive-staging_hive_2016-06-03_21-09-55_295_8631413609124947877-940/-ext-10000
> INFO : Time taken for load dynamic partitions : 190
> INFO : Loading partition {s=horton}
> INFO : Time taken for adding to write entity : 0
> INFO : Partition default.test{s=horton} stats: [numFiles=2, numRows=2,
> totalSize=9, rawDataSize=7]
> No rows affected (1.35 seconds)
> 0: jdbc:hive2://ts-0531-1.openstacklocal:2181>
> 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> select * from test;
> +---------+---------+--+
> | test.i | test.s |
> +---------+---------+--+
> | 1 | horton |
> | NULL | horton |
> +---------+---------+--+
> 2 rows selected (0.08 seconds)
> {noformat}
> The below is explain of insertion:
> {noformat}
> 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> explain insert into table test
> partition (s) values ('horton', 'horton');
> +-------------------------------------------------------------------------------------------------------+--+
> | Explain
> |
> +-------------------------------------------------------------------------------------------------------+--+
> | STAGE DEPENDENCIES:
> |
> | Stage-1 is a root stage
> |
> | Stage-2 depends on stages: Stage-1
> |
> | Stage-0 depends on stages: Stage-2
> |
> | Stage-3 depends on stages: Stage-0
> |
> |
> |
> | STAGE PLANS:
> |
> | Stage: Stage-1
> |
> | Tez
> |
> | DagId: hive_20160603211130_4262d739-5bc1-4be0-95c3-2b666f5db7b8:1323
> |
> | Vertices:
> |
> | Map 1
> |
> | Map Operator Tree:
> |
> | TableScan
> |
> | alias: values__tmp__table__17
> |
> | Statistics: Num rows: 1 Data size: 14 Basic stats:
> COMPLETE Column stats: NONE |
> | Select Operator
> |
> | expressions: tmp_values_col1 (type: string),
> tmp_values_col2 (type: string) |
> | outputColumnNames: _col0, _col1
> |
> | Statistics: Num rows: 1 Data size: 14 Basic stats:
> COMPLETE Column stats: NONE |
> | File Output Operator
> |
> | compressed: false
> |
> | Statistics: Num rows: 1 Data size: 14 Basic stats:
> COMPLETE Column stats: NONE |
> | table:
> |
> | input format:
> org.apache.hadoop.mapred.TextInputFormat |
> | output format:
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |
> | serde:
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
> | name: default.test
> |
> |
> |
> | Stage: Stage-2
> |
> | Dependency Collection
> |
> |
> |
> | Stage: Stage-0
> |
> | Move Operator
> |
> | tables:
> |
> | partition:
> |
> | s
> |
> | replace: false
> |
> | table:
> |
> | input format: org.apache.hadoop.mapred.TextInputFormat
> |
> | output format:
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |
> | serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> |
> | name: default.test
> |
> |
> |
> | Stage: Stage-3
> |
> | Stats-Aggr Operator
> |
> |
> |
> +-------------------------------------------------------------------------------------------------------+--+
> {noformat}
> The issue is NOT seen with orc table.
> Also I tried with another table partitioned by timestamp, but the issue was
> NOT seen:
> {noformat}
> 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> drop table if exists test;
> No rows affected (0.266 seconds)
> 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> create table test (i int)
> partitioned by (ts timestamp);
> No rows affected (0.16 seconds)
> 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> insert into table test
> partition (ts) values (1, '2034-08-04 17:42:59.0');
> INFO : Session is already open
> INFO : Dag name: insert into table test partit...17:42:59.0')(Stage-1)
> INFO :
> INFO : Status: Running (Executing on YARN cluster with App id
> application_1464727816747_0760)
> INFO : Map 1: 0/1
> INFO : Map 1: 0(+1)/1
> INFO : Map 1: 1/1
> INFO : Loading data to table default.test partition (ts=null) from
> hdfs://ts-0531-5.openstacklocal:8020/apps/hive/warehouse/test/.hive-staging_hive_2016-06-03_21-44-11_737_2024697703048602688-940/-ext-10000
> INFO : Time taken for load dynamic partitions : 170
> INFO : Loading partition {ts=2034-08-04 17:42:59.0}
> INFO : Time taken for adding to write entity : 1
> INFO : Partition default.test{ts=2034-08-04 17:42:59.0} stats: [numFiles=1,
> numRows=1, totalSize=2, rawDataSize=1]
> No rows affected (5.029 seconds)
> 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> select * from test;
> +---------+------------------------+--+
> | test.i | test.ts |
> +---------+------------------------+--+
> | 1 | 2034-08-04 17:42:59.0 |
> +---------+------------------------+--+
> 1 row selected (0.158 seconds)
> 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> alter table test replace
> columns (s string);
> No rows affected (0.14 seconds)
> 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> insert into table test
> partition (ts) values ('2', '2034-08-04 17:42:59.0');
> INFO : Session is already open
> INFO : Dag name: insert into table test partit...17:42:59.0')(Stage-1)
> INFO :
> INFO : Status: Running (Executing on YARN cluster with App id
> application_1464727816747_0760)
> INFO : Map 1: 0/1
> INFO : Map 1: 0(+1)/1
> INFO : Map 1: 0/1
> INFO : Map 1: 1/1
> INFO : Loading data to table default.test partition (ts=null) from
> hdfs://ts-0531-5.openstacklocal:8020/apps/hive/warehouse/test/.hive-staging_hive_2016-06-03_21-44-45_488_1120953773256187058-940/-ext-10000
> INFO : Time taken for load dynamic partitions : 254
> INFO : Loading partition {ts=2034-08-04 17:42:59.0}
> INFO : Time taken for adding to write entity : 1
> INFO : Partition default.test{ts=2034-08-04 17:42:59.0} stats: [numFiles=2,
> numRows=2, totalSize=4, rawDataSize=2]
> No rows affected (7.127 seconds)
> 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> select * from test;
> +---------+------------------------+--+
> | test.s | test.ts |
> +---------+------------------------+--+
> | 1 | 2034-08-04 17:42:59.0 |
> | 2 | 2034-08-04 17:42:59.0 |
> +---------+------------------------+--+
> 2 rows selected (0.115 seconds)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)