darren created HIVE-12314:
-----------------------------
Summary: "insert overwrite" produce redundant directory while
multiple execution
Key: HIVE-12314
URL: https://issues.apache.org/jira/browse/HIVE-12314
Project: Hive
Issue Type: Bug
Affects Versions: 1.1.0, 0.13.0
Reporter: darren
1)Perform the following command for the first time:
INSERT OVERWRITE TABLE dest PARTITION (dt='20151026') SELECT * FROM src;
Once it fails while trying to add partition into meta data,though the data file
has been copied to the table directory.
hdfs dfs -ls -R /user/hive/warehouse/dest/dt=20151026
-rw------- 3 admin hive 65 2015-10-30 19:34
/user/hive/warehouse/dest/dt=20151026/000000_0
0: jdbc:hive2://ha-cluster/default> show partitions dest;
+------------+
| partition |
+------------+
+------------+
No rows selected (0.154 seconds)
2)Perform the "insert overwrite" again:
INSERT OVERWRITE TABLE dest PARTITION (dt='20151026') SELECT * FROM src;
No matter if this time it succeeds or not,the partition directory will get
redundant directory just like the following example:
hdfs dfs -ls -R /user/hive/warehouse/dest/
drwx------ - admin hive 0 2015-10-30 19:36 /user/hive/warehouse/dest/dt=20151026
-rw------- 3 admin hive 65 2015-10-30 19:34
/user/hive/warehouse/dest/dt=20151026/000000_0
drwxrwxrwx - admin hive 0 2015-10-30 19:36
/user/hive/warehouse/dest/dt=20151026/-ext-10000
-rw------- 3 admin hive 65 2015-10-30 19:36
/user/hive/warehouse/dest/dt=20151026/-ext-10000/000000_0
3)This will cause a issue while try to select data from it.
0: jdbc:hive2://ha-cluster/default> select * from dest where dt='20151026';
Error: java.io.IOException: java.io.IOException: Not a file:
hdfs://hacluster/user/hive/warehouse/dest/dt=20151026/-ext-10000 (state=,code=0)
4)This issue turns different result for Hive-0.13 and Hive-1.1.0.
For Hive-0.13,it produces redundant directory.
For Hive-1.10,it generates duplicated data.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)