I know about the current behaviour of HDFS. I am proposing this new behaviour which i mentioned in my first mail.
In Hadoop-0.20.2 , a new block is allocated and stored at datanodes and a new INode is created in namespace. Why is an overwrite considered as a file creation operation. -vidur > Hi Vidur, > > I'm not following. The "overwrite" flag causes the file to be overwritten > starting at offset 0 - it doesn't allow you to retain any bit of the > preexisting file. It's equivalent to a remove followed by a create. Think > of > it like O_TRUNC. > > -Todd > > On Mon, Jun 21, 2010 at 10:03 PM, Vidur Goyal > <vi...@students.iiit.ac.in>wrote: > >> Dear Todd, >> >> By truncating i meant removing unused *blocks* from the namespace and >> let >> them be garbage collected. There will be no truncation of the last >> block(even if it is not full). This way , rather then garbage collecting >> all the blocks of a file , we will only be garbage collecting the >> remaining blocks. >> >> -vidur >> >> >> > HDFS assumes in hundreds of places that blocks never shrink. So, there >> is >> > no >> > option to truncate a block. >> > >> > -Todd >> > >> > On Mon, Jun 21, 2010 at 9:41 PM, Vidur Goyal >> > <vi...@students.iiit.ac.in>wrote: >> > >> >> Hi All, >> >> >> >> In FSNamesystem#startFileInternal , whenever there is a overwrite >> flag >> >> set >> >> , why is the INode removed from the namespace and a new >> >> INodeFileUnderConstruction is created. Why can't we use the convert >> the >> >> same INode to INodeFileUnderConstruction. And we start writing to the >> >> same >> >> blocks at the same datanodes (after incrementing the GS) followed by >> >> either truncating the remaining blocks(if the file size decreases) or >> >> allocating new blocks (if the file size increases). This will >> decrease >> >> data redundancy and the job of garbage collector and will increase >> >> security. >> >> >> >> vidur >> >> >> >> >> >> >> >> >> >> -- >> >> This message has been scanned for viruses and >> >> dangerous content by MailScanner, and is >> >> believed to be clean. >> >> >> >> >> > >> > >> > -- >> > Todd Lipcon >> > Software Engineer, Cloudera >> > >> > -- >> > This message has been scanned for viruses and >> > dangerous content by MailScanner, and is >> > believed to be clean. >> > >> > >> >> >> -- >> This message has been scanned for viruses and >> dangerous content by MailScanner, and is >> believed to be clean. >> >> > > > -- > Todd Lipcon > Software Engineer, Cloudera > > -- > This message has been scanned for viruses and > dangerous content by MailScanner, and is > believed to be clean. > > -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.