Junichi Oda created HIVE-18563: ---------------------------------- Summary: "Load data into table" behavior is different between 1.2.1 and 1.2.1000 Key: HIVE-18563 URL: https://issues.apache.org/jira/browse/HIVE-18563 Project: Hive Issue Type: Bug Components: Hive, HiveServer2 Environment: * OS : CentOS6 * JDK : 1.8.0_152(Oracle) * HDP : 2.3.2.0 and 2.6.2.0 * Hive : 1.2.1.2.3.2.0-2950 and 1.2.1000.2.6.2.0-205 Reporter: Junichi Oda
After upgrading HDP from 2.3.2.0 to 2.6.2.0, the "load data into table" behavior changed. Data is input hourly, All files have the same name. {code:java} /user/user1/logs/yyyymmdd/00/part-r-00000.gz /user/user1/logs/yyyymmdd/01/part-r-00000.gz /user/user1/logs/yyyymmdd/02/part-r-00000.gz /user/user1/logs/yyyymmdd/03/part-r-00000.gz ・・・・・・・・・・・・・・・・・・・・・・・ /user/user1/logs/yyyymmdd/22/part-r-00000.gz /user/user1/logs/yyyymmdd/23/part-r-00000.gz {code} Before upgrade (HDP 2.3.2.0 ) {code:java} HQL hive> load data inpath '/user/user1/logs/yyyymmdd/*/*.gz' into table sample_db.sample_tbl partition (dt='yyyymmdd'); Result /hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd /hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000.gz /hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_1.gz /hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_10.gz /hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_11.gz /hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_12.gz /hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_13.gz /hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_14.gz /hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_15.gz /hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_16.gz /hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_17.gz /hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_18.gz /hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_19.gz /hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_2.gz /hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_20.gz /hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_21.gz /hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_22.gz /hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_23.gz /hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_3.gz /hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_4.gz /hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_5.gz /hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_6.gz /hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_7.gz /hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_8.gz /hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_9.gz {code} All files were renamed into part-r-0000_copy_*.gz without the file part-r-0000.gz. After upgrade(HDP 2.6.2.0 ) {code:java} HQL hive> load data inpath '/user/user1/logs/yyyymmdd/*/*.gz' into table sample_db.sample_tbl partition (dt='yyyymmdd'); Result /hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd /hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000.gz {code} There is only part-r-0000.gz. This file was the same file as part-r-0000_copy_23.gz. When files are loaded one by one, I can load all files like as HDP 2.3.2.0 environment. Why is the behavior different between 2.3.2.0 and 2.6.2.0 ? Thanks in advance https://community.hortonworks.com/questions/158176/load-data-into-table-behavior-is-different-between.html -- This message was sent by Atlassian JIRA (v7.6.3#76005)