Far as I can tell, the file moves are atomic. See
http://search-hadoop.com/m/pwiE52olA3O1.

I've used this approach at my former workplace, and am sure there's a
lot of people using the same approach without hitting a scenario you
describe.

Note that its just the inode tree thats manipulated. The file itself,
in its completest sense, isn't "moved". Its just a rename, can't be
partial.

On Wed, May 2, 2012 at 9:20 PM, Stuti Awasthi <stutiawas...@hcl.com> wrote:
> So lets consider a case that I copied the file from local to hdfs temporary 
> directory and then after copying, I executed move to some Input dir. This 
> takes fraction of seconds but lets assume that my job is running on that 
> Input folder at that point in time when the file is getting moved and it 
> tries to access the half moved file.
>
> Now what happens? Does HDFS throw some IOExecptions or it will leave the file 
> unexecuted till next job runs.
>
> -----Original Message-----
> From: Harsh J [mailto:ha...@cloudera.com]
> Sent: Tuesday, May 01, 2012 6:11 PM
> To: hdfs-user@hadoop.apache.org
> Subject: Re: File Integrity in HDFS
>
> Yes renames/moves are merely metadata changes, like on your local filesystem 
> (unless you move across partitions/disks, a concept that wouldn't apply to a 
> DFS).
>
> On Tue, May 1, 2012 at 5:53 PM, Stuti Awasthi <stutiawas...@hcl.com> wrote:
>> Thanks Harsh,
>> I also looked that when we are doing copying from Local to HDFS or HDFS to 
>> HDFS, it takes considerable time depending on file size but if we move 
>> within HDFS, it is done instantly.
>> So internally does HDFS just rename the file and its metadata?
>>
>> -----Original Message-----
>> From: Harsh J [mailto:ha...@cloudera.com]
>> Sent: Tuesday, May 01, 2012 5:22 PM
>> To: hdfs-user@hadoop.apache.org
>> Subject: Re: File Integrity in HDFS
>>
>> The easiest way out would be to rename files to pick-up-able name upon 
>> successful copy, or have the loading done to a different directory and 
>> rename/move the file when successfully closed to the job input directory.
>>
>> On Tue, May 1, 2012 at 3:22 PM, Stuti Awasthi <stutiawas...@hcl.com> wrote:
>>> Hi All,
>>>
>>>
>>>
>>> I have a scenario in which Input files are copied to HDFS and MR jobs
>>> run on the input directory.
>>>
>>> Now there can be a scenario in which file is getting copied to HDFS
>>> and MR jobs starts , in this case I do not want my MR job to pick
>>> those files which are getting copied to hdfs and process of copying is not 
>>> complete.
>>>
>>>
>>>
>>> Is there any way/api to check that if the file is not completely
>>> written to HDFS we can know.
>>>
>>>
>>>
>>> Regards,
>>>
>>> Stuti Awasthi
>>>
>>> HCL Comnet Systems and Services Ltd
>>>
>>> F-8/9 Basement, Sec-3,Noida.
>>>
>>>
>>>
>>>
>>> ________________________________
>>>
>>>
>>> ::DISCLAIMER::
>>> ---------------------------------------------------------------------
>>> -
>>> -------------------------------------------------
>>>
>>> The contents of this e-mail and any attachment(s) are confidential
>>> and intended for the named recipient(s) only.
>>> E-mail transmission cannot be guaranteed to be secure or error-free
>>> as information could be intercepted, corrupted, lost, destroyed,
>>> arrive late or incomplete, or contain viruses.The e mail and its
>>> contents (with or without referred
>>> errors) shall therefore not attach any liability on the originator or
>>> HCL or its affiliates. Any views or opinions presented in this email
>>> are solely those of the author and may not necessarily reflect the
>>> opinions of HCL or its affiliates. Any form of reproduction,
>>> dissemination, copying, disclosure, Modification, distribution and/or
>>> publication of this message without the prior written consent of the
>>> author of this e-mail is strictly prohibited. If you have received
>>> this email in error please delete it and notify the sender
>>> immediately. Before opening any mail and attachments please check
>>> them for viruses and defect.
>>>
>>> ---------------------------------------------------------------------
>>> -
>>> -------------------------------------------------
>>
>>
>>
>> --
>> Harsh J
>
>
>
> --
> Harsh J



-- 
Harsh J

Reply via email to