you are so nice,thank you very much:) last question: can i trigger block sync without restart hdfs?
发自我的 iPhone 在 2011-9-8,15:00,Todd Lipcon <t...@cloudera.com> 写道: > 2011/9/7 kang hua <kanghua...@msn.com>: >> Thanks my friend! >> please allow me to ask more question about detail thinks! >> 1 yes, I can use hadoop fs -tail or -cat xxx to see that file content, But >> how can I get that file real size in other process if namenode is not change >> ? I real want is to read the date in tail of that file. > > You can open the file and then use an API on the DFSInputStream class > to find the length. I don't recall the name of the API, but if you > look in there, you should see it. > >> >> 2 why "when I reboot hdfs, I can see that file's content that I flushed >> again by "hadoop fs -ls xxx" " > > On restart, the namenode triggers block synchronization, and the > up-to-date length is determined. > >> 3 In append mode. close file and open it with append mode again and again . >> real dataspace is normally increase, but nodename show dfs used space >> increase to fast. it is a bug ? > > Might be a bug, yes. > >> 4 which version of hdfs that append is no bug ? > > 0.21, which is buggy in other aspects. So, no stable released version > has a working append() call. > > In truth I've never seen a _good_ use case for > append-to-an-existing-file. Usually you can do just as well by keeping > the file open and periodically hflushing, or rolling to a new file > when you want to add more records to an existing dataset. > > -Todd > >>> From: t...@cloudera.com >>> Date: Wed, 7 Sep 2011 14:17:10 -0700 >>> Subject: Re: Question about hdfs close * hflush behavior >>> To: hdfs-user@hadoop.apache.orgSend >>> >>> 2011/9/7 kang hua <kanghua...@msn.com>: >>>> >>>> Hi friends: >>>> I has two question. >>>> first one is: >>>> I use libhdfs's hflush to flush my data to a file, in same process >>>> context I can read it. But I find that file unchanged if I check from >>>> hadoop >>>> shell ---- it's len is zero( check by "hadoop fs -ls xxx" or read it in >>>> program); however when I reboot hdfs, I can read that file's content >>>> that I >>>> flushed again。 why ? >>> >>> If we were to update th e file metadata on hflush, it would be very >>> expensive, since the metadata lives in the NameNode. >>> >>> If you do hadoop fs -cat xxx, you should see the entirety of the flushed >>> data. >>> >>>> can I hflush data to file without close it,at same time read data >>>> flushed >>>> by other process ? >>> >>> yes. >>> >> >> >> >> >> >>>> second one is: >>>> does once close hdfs file, the last written block is untouched. even >>>> open >>>> that file with append mode, namenode will alloc a new block to for >>>> append >>>> data? >>> >>> No, it reopens the last block of the existing file for append. >>> >>>> I find if I close file and open it with append mode again and again. >>>> hdfs >>>> report will show "used space much more that the file logic size" >>> >>> Not sure I follow what you mean by this. Can you give more d etail? >>> >>>> btw: I use cloudera ch2 >>> >>> The actual "append()" function has some bugs in all of the 0.20 >>> releases, including Cloudera's. The hflush/sync() API is fine to use, >>> but I would recommend against using append(). >>> >>> -Todd >>> -- >>> Todd Lipcon >>> Software Engineer, Cloudera >> > > > > -- > Todd Lipcon > Software Engineer, Cloudera >