[ 
https://issues.apache.org/jira/browse/HIVE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13109247#comment-13109247
 ] 

Chinna Rao Lalam commented on HIVE-1996:
----------------------------------------

Hi He Yongqiang,
Here these 2 scenarios need to consider
1)If rename disabled  load one data folder that contains 10 files like 
1.txt,2.txt...,10.txt   here in the table already one file present with same 
name 5.txt. While loading 5.txt it will throw the  exception and operation will 
fail but here already loaded file(1.txt,2.txt....4.txt) will present...

 1.txt,2.txt...,10.txt   here in the table already one file present with same 
name 6.txt. While loading 6.txt it will throw the  exception and operation will 
fail but here already loaded file(1.txt,2.txt....4.txt,5.txt ) will present...

        So its mainly dependent on the order and can cause inconsistencies.

2)In the current implementation also 
"org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Path, Path, FileSystem)"  if 
any of the file it is unable to rename it will throw exception but for the same 
operation some file will be loaded.

Proposed Sol:  While loading if any exception comes note that file as unloaded 
file and continue the load with remaining files  and operation will fail with 
the exception and unloaded file information  so user can retry loading the 
unloaded files alone. Here there is no inconsistent data.

Pls give u r inputs

> "LOAD DATA INPATH" fails when the table already contains a file of the same 
> name
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-1996
>                 URL: https://issues.apache.org/jira/browse/HIVE-1996
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Kirk True
>            Assignee: Chinna Rao Lalam
>         Attachments: HIVE-1996.1.Patch, HIVE-1996.Patch
>
>
> Steps:
> 1. From the command line copy the kv2.txt data file into the current user's 
> HDFS directory:
> {{$ hadoop fs -copyFromLocal /path/to/hive/sources/data/files/kv2.txt 
> kv2.txt}}
> 2. In Hive, create the table:
> {{create table tst_src1 (key_ int, value_ string);}}
> 3. Load the data into the table from HDFS:
> {{load data inpath './kv2.txt' into table tst_src1;}}
> 4. Repeat step 1
> 5. Repeat step 3
> Expected:
> To have kv2.txt renamed in HDFS and then copied to the destination as per 
> HIVE-307.
> Actual:
> File is renamed, but {{Hive.copyFiles}} doesn't "see" the change in {{srcs}} 
> as it continues to use the same array elements (with the un-renamed, old file 
> names). It crashes with this error:
> {noformat}
> java.lang.NullPointerException
>     at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:1725)
>     at org.apache.hadoop.hive.ql.metadata.Table.copyFiles(Table.java:541)
>     at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1173)
>     at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:197)
>     at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130)
>     at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
>     at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1060)
>     at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:897)
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:745)
>     at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164)
>     at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
>     at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to