[ 
https://issues.apache.org/jira/browse/HIVE-18702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16362512#comment-16362512
 ] 

Oleksiy Sayankin edited comment on HIVE-18702 at 2/13/18 3:53 PM:
------------------------------------------------------------------

*FIXED*

*ROOT-CAUSE*

This {{if}} statement does not work

{code}
      FileStatus[] statuses = HiveStatsUtils.getFileStatusRecurse(
          tmpPath, ((dpCtx == null) ? 1 : dpCtx.getNumDPCols()), fs);
      if(statuses != null && statuses.length > 0) {
{code}

when there are no files in temp staging folder 
{{/bug/.hive-staging_hive/_tmp.-ext-10000}}. Thus folder {{-ext-10000}} is not 
created. After that this section of code

{code}
  protected void replaceFiles(Path tablePath, Path srcf, Path destf, Path 
oldPath, HiveConf conf,
          boolean isSrcLocal, boolean purge) throws HiveException {
    try {

      FileSystem destFs = destf.getFileSystem(conf);
      // check if srcf contains nested sub-directories
      FileStatus[] srcs;
      FileSystem srcFs;
      try {
        srcFs = srcf.getFileSystem(conf);
        srcs = srcFs.globStatus(srcf);
      } catch (IOException e) {
        throw new HiveException("Getting globStatus " + srcf.toString(), e);
      }
      if (srcs == null) {
        LOG.info("No sources specified to move: " + srcf);
        return;
      }
{code}

returns {{LOG.info("No sources specified to move: " + srcf);}} and existing 
values in the table are not overwritten.

*SOLUTION*

Use {{fs.exists(tmpPath)}} instead of {{FileStatus[] statuses}}.


was (Author: osayankin):
*FIXED*

*ROOT-CAUSE*

This {{if}} statement does not work

{code}
      FileStatus[] statuses = HiveStatsUtils.getFileStatusRecurse(
          tmpPath, ((dpCtx == null) ? 1 : dpCtx.getNumDPCols()), fs);
      if(statuses != null && statuses.length > 0) {
{code}

when there are no files in 
{{/bug/.hive-staging_hive_2018-02-13_14-14-39_529_3325659916929491937-1/_task_tmp.-ext-10000}}.
 Thus folder {{-ext-10000}} is not created. After that this section of code

{code}
  protected void replaceFiles(Path tablePath, Path srcf, Path destf, Path 
oldPath, HiveConf conf,
          boolean isSrcLocal, boolean purge) throws HiveException {
    try {

      FileSystem destFs = destf.getFileSystem(conf);
      // check if srcf contains nested sub-directories
      FileStatus[] srcs;
      FileSystem srcFs;
      try {
        srcFs = srcf.getFileSystem(conf);
        srcs = srcFs.globStatus(srcf);
      } catch (IOException e) {
        throw new HiveException("Getting globStatus " + srcf.toString(), e);
      }
      if (srcs == null) {
        LOG.info("No sources specified to move: " + srcf);
        return;
      }
{code}

returns {{LOG.info("No sources specified to move: " + srcf);}} and existing 
values in the table are not overwritten.

*SOLUTION*

Use {{fs.exists(tmpPath)}} instead of {{FileStatus[] statuses}}.

> INSERT OVERWRITE TABLE doesn't clean the table directory before overwriting
> ---------------------------------------------------------------------------
>
>                 Key: HIVE-18702
>                 URL: https://issues.apache.org/jira/browse/HIVE-18702
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 2.3.2
>            Reporter: Oleksiy Sayankin
>            Assignee: Oleksiy Sayankin
>            Priority: Major
>             Fix For: 3.0.0, 2.3.3
>
>         Attachments: HIVE-18702.1.patch
>
>
> Enable Hive on TEZ. (MR works fine).
> *STEP 1. Create test data*
> {code}
> nano /home/test/users.txt
> {code}
> Add to file:
> {code}
> Peter,34
> John,25
> Mary,28
> {code}
> {code}
> hadoop fs -mkdir /bug
> hadoop fs -copyFromLocal /home/test/users.txt /bug
> hadoop fs -ls /bug
> {code}
> *EXPECTED RESULT:*
> {code}
> Found 2 items                                                                 
>   
> -rwxr-xr-x   3 root root         25 2015-10-15 16:11 /bug/users.txt
> {code}
> *STEP 2. Upload data to hive*
> {code}
> create external table bug(name string, age int) ROW FORMAT DELIMITED FIELDS 
> TERMINATED BY ',' LINES TERMINATED BY '\n' LOCATION '/bug';
> select * from bug;
> {code}
> *EXPECTED RESULT:*
> {code}
> OK
> Peter   34
> John    25
> Mary    28
> {code}
> {code}
> create external table bug1(name string, age int) ROW FORMAT DELIMITED FIELDS 
> TERMINATED BY ',' LINES TERMINATED BY '\n' LOCATION '/bug1';
> insert overwrite table bug select * from bug1;
> select * from bug;
> {code}
> *EXPECTED RESULT:*
> {code}
> OK
> Time taken: 0.097 seconds
> {code}
> *ACTUAL RESULT:*
> {code}
> hive>  select * from bug;
> OK
> Peter 34
> John  25
> Mary  28
> Time taken: 0.198 seconds, Fetched: 3 row(s)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to