[ https://issues.apache.org/jira/browse/HIVE-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12754632#action_12754632 ]
Todd Lipcon commented on HIVE-718: ---------------------------------- Even in the case of LOAD...OVERWRITE, we currently lack atomicity. The old directory is deleted prior to the new directory being moved in. There's a small window of time in which neither old nor new data is present. Sure, this is probably on the order of a half second, but it is still not correct. There's also the general case of a query which has already computed input splits being affected by a concurrent LOAD DATA OVERWRITE. The versioning solution I think gets us partly there, but will be very tricky to implement correctly while maintaining performance since there's no way to do a copy-on-write directory snapshot built in to HDFS. So, I think a significant amount of work will have to be done in the metastore and we'll definitely have to drop the "external process load" ability that currently exists. Should we open a new JIRA for the general concurrency control/locking issues we're discussing here? It seems like this ticket should be used for the 0.3->0.4 regression, even if it's just a temporary fix. We can then put a more general correct solution on the roadmap for 0.5 or later, since it's looking like it will be a complicated project. > Load data inpath into a new partition without overwrite does not move the file > ------------------------------------------------------------------------------ > > Key: HIVE-718 > URL: https://issues.apache.org/jira/browse/HIVE-718 > Project: Hadoop Hive > Issue Type: Bug > Affects Versions: 0.4.0 > Reporter: Zheng Shao > Attachments: HIVE-718.1.patch, HIVE-718.2.patch, hive-718.txt > > > The bug can be reproduced as following. Note that it only happens for > partitioned tables. The select after the first load returns nothing, while > the second returns the data correctly. > insert.txt in the current local directory contains 3 lines: "a", "b" and "c". > {code} > > create table tmp_insert_test (value string) stored as textfile; > > load data local inpath 'insert.txt' into table tmp_insert_test; > > select * from tmp_insert_test; > a > b > c > > create table tmp_insert_test_p ( value string) partitioned by (ds string) > > stored as textfile; > > load data local inpath 'insert.txt' into table tmp_insert_test_p partition > > (ds = '2009-08-01'); > > select * from tmp_insert_test_p where ds= '2009-08-01'; > > load data local inpath 'insert.txt' into table tmp_insert_test_p partition > > (ds = '2009-08-01'); > > select * from tmp_insert_test_p where ds= '2009-08-01'; > a 2009-08-01 > b 2009-08-01 > d 2009-08-01 > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.