Far as I can tell, the file moves are atomic. See http://search-hadoop.com/m/pwiE52olA3O1.
I've used this approach at my former workplace, and am sure there's a lot of people using the same approach without hitting a scenario you describe. Note that its just the inode tree thats manipulated. The file itself, in its completest sense, isn't "moved". Its just a rename, can't be partial. On Wed, May 2, 2012 at 9:20 PM, Stuti Awasthi <stutiawas...@hcl.com> wrote: > So lets consider a case that I copied the file from local to hdfs temporary > directory and then after copying, I executed move to some Input dir. This > takes fraction of seconds but lets assume that my job is running on that > Input folder at that point in time when the file is getting moved and it > tries to access the half moved file. > > Now what happens? Does HDFS throw some IOExecptions or it will leave the file > unexecuted till next job runs. > > -----Original Message----- > From: Harsh J [mailto:ha...@cloudera.com] > Sent: Tuesday, May 01, 2012 6:11 PM > To: hdfs-user@hadoop.apache.org > Subject: Re: File Integrity in HDFS > > Yes renames/moves are merely metadata changes, like on your local filesystem > (unless you move across partitions/disks, a concept that wouldn't apply to a > DFS). > > On Tue, May 1, 2012 at 5:53 PM, Stuti Awasthi <stutiawas...@hcl.com> wrote: >> Thanks Harsh, >> I also looked that when we are doing copying from Local to HDFS or HDFS to >> HDFS, it takes considerable time depending on file size but if we move >> within HDFS, it is done instantly. >> So internally does HDFS just rename the file and its metadata? >> >> -----Original Message----- >> From: Harsh J [mailto:ha...@cloudera.com] >> Sent: Tuesday, May 01, 2012 5:22 PM >> To: hdfs-user@hadoop.apache.org >> Subject: Re: File Integrity in HDFS >> >> The easiest way out would be to rename files to pick-up-able name upon >> successful copy, or have the loading done to a different directory and >> rename/move the file when successfully closed to the job input directory. >> >> On Tue, May 1, 2012 at 3:22 PM, Stuti Awasthi <stutiawas...@hcl.com> wrote: >>> Hi All, >>> >>> >>> >>> I have a scenario in which Input files are copied to HDFS and MR jobs >>> run on the input directory. >>> >>> Now there can be a scenario in which file is getting copied to HDFS >>> and MR jobs starts , in this case I do not want my MR job to pick >>> those files which are getting copied to hdfs and process of copying is not >>> complete. >>> >>> >>> >>> Is there any way/api to check that if the file is not completely >>> written to HDFS we can know. >>> >>> >>> >>> Regards, >>> >>> Stuti Awasthi >>> >>> HCL Comnet Systems and Services Ltd >>> >>> F-8/9 Basement, Sec-3,Noida. >>> >>> >>> >>> >>> ________________________________ >>> >>> >>> ::DISCLAIMER:: >>> --------------------------------------------------------------------- >>> - >>> ------------------------------------------------- >>> >>> The contents of this e-mail and any attachment(s) are confidential >>> and intended for the named recipient(s) only. >>> E-mail transmission cannot be guaranteed to be secure or error-free >>> as information could be intercepted, corrupted, lost, destroyed, >>> arrive late or incomplete, or contain viruses.The e mail and its >>> contents (with or without referred >>> errors) shall therefore not attach any liability on the originator or >>> HCL or its affiliates. Any views or opinions presented in this email >>> are solely those of the author and may not necessarily reflect the >>> opinions of HCL or its affiliates. Any form of reproduction, >>> dissemination, copying, disclosure, Modification, distribution and/or >>> publication of this message without the prior written consent of the >>> author of this e-mail is strictly prohibited. If you have received >>> this email in error please delete it and notify the sender >>> immediately. Before opening any mail and attachments please check >>> them for viruses and defect. >>> >>> --------------------------------------------------------------------- >>> - >>> ------------------------------------------------- >> >> >> >> -- >> Harsh J > > > > -- > Harsh J -- Harsh J