[ https://issues.apache.org/jira/browse/HIVE-18702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16362512#comment-16362512 ]
Oleksiy Sayankin edited comment on HIVE-18702 at 2/13/18 3:51 PM: ------------------------------------------------------------------ *FIXED* *ROOT-CAUSE* This {{if}} statement does not work {code} FileStatus[] statuses = HiveStatsUtils.getFileStatusRecurse( tmpPath, ((dpCtx == null) ? 1 : dpCtx.getNumDPCols()), fs); if(statuses != null && statuses.length > 0) { {code} when there are no files in {{/bug/.hive-staging_hive_2018-02-13_14-14-39_529_3325659916929491937-1/_task_tmp.-ext-10000}}. Thus folder {{-ext-10000}} is not created. After that this section of code {code} protected void replaceFiles(Path tablePath, Path srcf, Path destf, Path oldPath, HiveConf conf, boolean isSrcLocal, boolean purge) throws HiveException { try { FileSystem destFs = destf.getFileSystem(conf); // check if srcf contains nested sub-directories FileStatus[] srcs; FileSystem srcFs; try { srcFs = srcf.getFileSystem(conf); srcs = srcFs.globStatus(srcf); } catch (IOException e) { throw new HiveException("Getting globStatus " + srcf.toString(), e); } if (srcs == null) { LOG.info("No sources specified to move: " + srcf); return; } {code} returns {{LOG.info("No sources specified to move: " + srcf);}} and existing values in the table are not overwritten. *SOLUTION* Use {{fs.exists(tmpPath)}} instead of {{FileStatus[] statuses}}. was (Author: osayankin): *FIXED* *ROOT-CAUSE* This if statement does not work {code} FileStatus[] statuses = HiveStatsUtils.getFileStatusRecurse( tmpPath, ((dpCtx == null) ? 1 : dpCtx.getNumDPCols()), fs); if(statuses != null && statuses.length > 0) { {code} when there are no files in {{/bug/.hive-staging_hive_2018-02-13_14-14-39_529_3325659916929491937-1/_task_tmp.-ext-10000}}. Thus folder {{-ext-10000}} is not created. After that this section of code {code} protected void replaceFiles(Path tablePath, Path srcf, Path destf, Path oldPath, HiveConf conf, boolean isSrcLocal, boolean purge) throws HiveException { try { FileSystem destFs = destf.getFileSystem(conf); // check if srcf contains nested sub-directories FileStatus[] srcs; FileSystem srcFs; try { srcFs = srcf.getFileSystem(conf); srcs = srcFs.globStatus(srcf); } catch (IOException e) { throw new HiveException("Getting globStatus " + srcf.toString(), e); } if (srcs == null) { LOG.info("No sources specified to move: " + srcf); return; } {code} returns {{LOG.info("No sources specified to move: " + srcf);}} and existing values in the table are not overwritten. *SOLUTION* Use {{fs.exists(tmpPath)}} instead of {{FileStatus[] statuses}}. > INSERT OVERWRITE TABLE doesn't clean the table directory before overwriting > --------------------------------------------------------------------------- > > Key: HIVE-18702 > URL: https://issues.apache.org/jira/browse/HIVE-18702 > Project: Hive > Issue Type: Bug > Affects Versions: 2.3.2 > Reporter: Oleksiy Sayankin > Assignee: Oleksiy Sayankin > Priority: Major > Fix For: 3.0.0, 2.3.3 > > Attachments: HIVE-18702.1.patch > > > Enable Hive on TEZ. (MR works fine). > *STEP 1. Create test data* > {code} > nano /home/test/users.txt > {code} > Add to file: > {code} > Peter,34 > John,25 > Mary,28 > {code} > {code} > hadoop fs -mkdir /bug > hadoop fs -copyFromLocal /home/test/users.txt /bug > hadoop fs -ls /bug > {code} > *EXPECTED RESULT:* > {code} > Found 2 items > > -rwxr-xr-x 3 root root 25 2015-10-15 16:11 /bug/users.txt > {code} > *STEP 2. Upload data to hive* > {code} > create external table bug(name string, age int) ROW FORMAT DELIMITED FIELDS > TERMINATED BY ',' LINES TERMINATED BY '\n' LOCATION '/bug'; > select * from bug; > {code} > *EXPECTED RESULT:* > {code} > OK > Peter 34 > John 25 > Mary 28 > {code} > {code} > create external table bug1(name string, age int) ROW FORMAT DELIMITED FIELDS > TERMINATED BY ',' LINES TERMINATED BY '\n' LOCATION '/bug1'; > insert overwrite table bug select * from bug1; > select * from bug; > {code} > *EXPECTED RESULT:* > {code} > OK > Time taken: 0.097 seconds > {code} > *ACTUAL RESULT:* > {code} > hive> select * from bug; > OK > Peter 34 > John 25 > Mary 28 > Time taken: 0.198 seconds, Fetched: 3 row(s) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)