Re: Overwriting the same block instead of creating a new one

Todd Lipcon Tue, 22 Jun 2010 00:10:33 -0700

On Mon, Jun 21, 2010 at 10:31 PM, Vidur Goyal <vi...@students.iiit.ac.in>wrote:


> I know about the current behaviour of HDFS. I am proposing this new
> behaviour which i mentioned in my first mail.
>
>
If you're going to propose new behavior, you should be prepared to explain
why that new behavior is useful, and what the new behavior is, in terms of
visible semantics, not in terms of implementation. It sounded to me like you
were proposing an optimization for a very infrequent operation which is
never the bottleneck of a real use case.


> In Hadoop-0.20.2 , a new block is allocated and stored at datanodes and a
> new INode is created in namespace. Why is an overwrite considered as a
> file creation operation.
>

I answered above - a new block is created because blocks cannot shrink. It's
considered a file creation because otherwise the old file would have shrunk,
which is not allowed.

-Todd


> -vidur
> > Hi Vidur,
> >
> > I'm not following. The "overwrite" flag causes the file to be overwritten
> > starting at offset 0 - it doesn't allow you to retain any bit of the
> > preexisting file. It's equivalent to a remove followed by a create. Think
> > of
> > it like O_TRUNC.
> >
> > -Todd
> >
> > On Mon, Jun 21, 2010 at 10:03 PM, Vidur Goyal
> > <vi...@students.iiit.ac.in>wrote:
> >
> >> Dear Todd,
> >>
> >> By truncating i meant removing unused *blocks* from the namespace and
> >> let
> >> them be garbage collected. There will be no truncation of the last
> >> block(even if it is not full). This way , rather then garbage collecting
> >> all the blocks of a file , we will only be garbage collecting the
> >> remaining blocks.
> >>
> >> -vidur
> >>
> >>
> >> > HDFS assumes in hundreds of places that blocks never shrink. So, there
> >> is
> >> > no
> >> > option to truncate a block.
> >> >
> >> > -Todd
> >> >
> >> > On Mon, Jun 21, 2010 at 9:41 PM, Vidur Goyal
> >> > <vi...@students.iiit.ac.in>wrote:
> >> >
> >> >> Hi All,
> >> >>
> >> >> In FSNamesystem#startFileInternal , whenever there is a overwrite
> >> flag
> >> >> set
> >> >> , why is the INode removed from the namespace and a new
> >> >> INodeFileUnderConstruction is created. Why can't we use the convert
> >> the
> >> >> same INode to INodeFileUnderConstruction. And we start writing to the
> >> >> same
> >> >> blocks at the same datanodes (after incrementing the GS) followed by
> >> >> either truncating the remaining blocks(if the file size decreases) or
> >> >> allocating new blocks (if the file size increases). This will
> >> decrease
> >> >> data redundancy and the job of garbage collector and will increase
> >> >> security.
> >> >>
> >> >> vidur
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> This message has been scanned for viruses and
> >> >> dangerous content by MailScanner, and is
> >> >> believed to be clean.
> >> >>
> >> >>
> >> >
> >> >
> >> > --
> >> > Todd Lipcon
> >> > Software Engineer, Cloudera
> >> >
> >> > --
> >> > This message has been scanned for viruses and
> >> > dangerous content by MailScanner, and is
> >> > believed to be clean.
> >> >
> >> >
> >>
> >>
> >> --
> >> This message has been scanned for viruses and
> >> dangerous content by MailScanner, and is
> >> believed to be clean.
> >>
> >>
> >
> >
> > --
> > Todd Lipcon
> > Software Engineer, Cloudera
> >
> > --
> > This message has been scanned for viruses and
> > dangerous content by MailScanner, and is
> > believed to be clean.
> >
> >
>
>
> --
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Overwriting the same block instead of creating a new one

Reply via email to