[ 
https://issues.apache.org/jira/browse/HIVE-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744143#action_12744143
 ] 

Todd Lipcon commented on HIVE-718:
----------------------------------

Prasad: this is true, but the atomicity issue was already there - we just 
discovered it while looking at this section of the code. My guess is that there 
are similar bugs elsewhere in Hive - there's no safe way to have multiple 
threads compete for a directory name, so really any movement of files has to be 
coordinated by the metastore to be safe. The hadoop-side issues are:

- mkdirs on an existing directory will return the exact same thing as mkdirs on 
one that didn't previously exist
- rename of one directory to a new name will not give an error if the new name 
exists - rather, it will move your directory inside that one
- checking if (not exists) { do something to a location } is obviously racy

> Load data inpath into a new partition without overwrite does not move the file
> ------------------------------------------------------------------------------
>
>                 Key: HIVE-718
>                 URL: https://issues.apache.org/jira/browse/HIVE-718
>             Project: Hadoop Hive
>          Issue Type: Bug
>    Affects Versions: 0.4.0
>            Reporter: Zheng Shao
>         Attachments: HIVE-718.1.patch, HIVE-718.2.patch, hive-718.txt
>
>
> The bug can be reproduced as following. Note that it only happens for 
> partitioned tables. The select after the first load returns nothing, while 
> the second returns the data correctly.
> insert.txt in the current local directory contains 3 lines: "a", "b" and "c".
> {code}
> > create table tmp_insert_test (value string) stored as textfile;
> > load data local inpath 'insert.txt' into table tmp_insert_test;
> > select * from tmp_insert_test;
> a
> b
> c
> > create table tmp_insert_test_p ( value string) partitioned by (ds string) 
> > stored as textfile;
> > load data local inpath 'insert.txt' into table tmp_insert_test_p partition 
> > (ds = '2009-08-01');
> > select * from tmp_insert_test_p where ds= '2009-08-01';
> > load data local inpath 'insert.txt' into table tmp_insert_test_p partition 
> > (ds = '2009-08-01');
> > select * from tmp_insert_test_p where ds= '2009-08-01';
> a       2009-08-01
> b       2009-08-01
> d       2009-08-01
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to