I know about the current behaviour of HDFS. I am proposing this new
behaviour which i mentioned in my first mail.

In Hadoop-0.20.2 , a new block is allocated and stored at datanodes and a
new INode is created in namespace. Why is an overwrite considered as a
file creation operation.

-vidur
> Hi Vidur,
>
> I'm not following. The "overwrite" flag causes the file to be overwritten
> starting at offset 0 - it doesn't allow you to retain any bit of the
> preexisting file. It's equivalent to a remove followed by a create. Think
> of
> it like O_TRUNC.
>
> -Todd
>
> On Mon, Jun 21, 2010 at 10:03 PM, Vidur Goyal
> <vi...@students.iiit.ac.in>wrote:
>
>> Dear Todd,
>>
>> By truncating i meant removing unused *blocks* from the namespace and
>> let
>> them be garbage collected. There will be no truncation of the last
>> block(even if it is not full). This way , rather then garbage collecting
>> all the blocks of a file , we will only be garbage collecting the
>> remaining blocks.
>>
>> -vidur
>>
>>
>> > HDFS assumes in hundreds of places that blocks never shrink. So, there
>> is
>> > no
>> > option to truncate a block.
>> >
>> > -Todd
>> >
>> > On Mon, Jun 21, 2010 at 9:41 PM, Vidur Goyal
>> > <vi...@students.iiit.ac.in>wrote:
>> >
>> >> Hi All,
>> >>
>> >> In FSNamesystem#startFileInternal , whenever there is a overwrite
>> flag
>> >> set
>> >> , why is the INode removed from the namespace and a new
>> >> INodeFileUnderConstruction is created. Why can't we use the convert
>> the
>> >> same INode to INodeFileUnderConstruction. And we start writing to the
>> >> same
>> >> blocks at the same datanodes (after incrementing the GS) followed by
>> >> either truncating the remaining blocks(if the file size decreases) or
>> >> allocating new blocks (if the file size increases). This will
>> decrease
>> >> data redundancy and the job of garbage collector and will increase
>> >> security.
>> >>
>> >> vidur
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> This message has been scanned for viruses and
>> >> dangerous content by MailScanner, and is
>> >> believed to be clean.
>> >>
>> >>
>> >
>> >
>> > --
>> > Todd Lipcon
>> > Software Engineer, Cloudera
>> >
>> > --
>> > This message has been scanned for viruses and
>> > dangerous content by MailScanner, and is
>> > believed to be clean.
>> >
>> >
>>
>>
>> --
>> This message has been scanned for viruses and
>> dangerous content by MailScanner, and is
>> believed to be clean.
>>
>>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>
> --
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
>
>


-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

Reply via email to