Prasanth Jayachandran created HIVE-17403:
--------------------------------------------

             Summary: Fail concatenation for unmanaged tables
                 Key: HIVE-17403
                 URL: https://issues.apache.org/jira/browse/HIVE-17403
             Project: Hive
          Issue Type: Bug
    Affects Versions: 1.3.0, 3.0.0, 2.4.0
            Reporter: Prasanth Jayachandran
            Assignee: Prasanth Jayachandran
            Priority: Blocker


ALTER TABLE .. CONCATENATE should fail if the table is not managed by hive. 

For unmanaged tables, file names can be anything. Hive has some assumptions 
about file names which can result in data loss for unmanaged tables. 

Example of this is a table/partition having 2 different files files 
(part-m-00000__1417075294718 and part-m-00018__1417075294718). Although both 
are completely different files, hive thinks these are files generated by 
separate instances of same task (because of failure or speculative execution). 
Hive will end up removing this file

{code}
2017-08-28T18:19:29,516 WARN  [b27f10d5-d957-4695-ab2a-1453401793df main]: 
exec.Utilities (:()) - Duplicate taskid file removed: 
file:/Users/table/part=20141120/.hive-staging_hive_2017-08-28_18-19-27_210_3381701454205724533-1/_tmp.-ext-10000/part-m-00018__1417075294718
 with length 958510. Existing file: 
file:/Users/table/part=20141120/.hive-staging_hive_2017-08-28_18-19-27_210_3381701454205724533-1/_tmp.-ext-10000/part-m-00000__1417075294718
 with length 1123116
{code}

DDL should restrict concatenation for unmanaged tables. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to