[ 
https://issues.apache.org/jira/browse/HIVE-24936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harish JP resolved HIVE-24936.
------------------------------
       Fix Version/s: 4.0.0
    Target Version/s: 4.0.0
          Resolution: Fixed

> Fix file name parsing and copy file move.
> -----------------------------------------
>
>                 Key: HIVE-24936
>                 URL: https://issues.apache.org/jira/browse/HIVE-24936
>             Project: Hive
>          Issue Type: Bug
>          Components: HiveServer2
>            Reporter: Harish JP
>            Assignee: Harish JP
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>          Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The taskId and taskAttemptId is not extracted correctly for copy files 
> (00001_02_copy_3) and when doing a move file of an incompatible copy file the 
> rename utility generates wrong file names. Ex: 00001_02_copy_3 is renamed to 
> 00001_02_copy_3_1 if 00001_02_copy_3 already exists, ideally it should be 
> 00001_02_copy_N.
>  
> Incompatible files should be always renamed using the current task or it can 
> get deleted if the file name conflicts with another task output file. Ex: if 
> the input file name for a task is 00005_01 and is incompatible then if we 
> move this file, it will be treated as an output file for task id 5, attempt 1 
> which if exists will try to generate the same file and fail and another 
> attempt will be made. There will be 2 files 00005_01, 00005_02, the deduping 
> code will remove 00005_01 resulting in data loss. There are other scenarios 
> where the same can happen.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to