I get it 。3x 发自我的 iPhone
在 2011-9-8,23:57,Todd Lipcon <t...@cloudera.com> 写道: > 2011/9/8 Kanghua151 <kanghua...@msn.com>: >> you are so nice,thank you very much:) >> last question: >> can i trigger block sync without restart hdfs? > > Close the file or have a machine crash :) But no, not really. > >> >> >> 发自我的 iPhone >> >> 在 2011-9-8,15:00,Todd Lipcon <t...@cloudera.com> 写道: >> >>> 2011/9/7 kang hua <kanghua...@msn.com>: >>>> Thanks my friend! >>>> please allow me to ask more question about detail thinks! >>>> 1 yes, I can use hadoop fs -tail or -cat xxx to see that file content, But >>>> how can I get that file real size in other process if namenode is not >>>> change >>>> ? I real want is to read the date in tail of that file. >>> >>> You can open the file and then use an API on the DFSInputStream class >>> to find the length. I don't recall the name of the API, but if you >>> look in there, you should see it. >>> >>>> >>>> 2 why "when I reboot hdfs, I can see that file's content that I flushed >>>> again by "hadoop fs -ls xxx" " >>> >>> On restart, the namenode triggers block synchronization, and the >>> up-to-date length is determined. >>> >>>> 3 In append mode. close file and open it with append mode again and again >>>> . >>>> real dataspace is normally increase, but nodename show dfs used space >>>> increase to fast. it is a bug ? >>> >>> Might be a bug, yes. >>> >>>> 4 which version of hdfs that append is no bug ? >>> >>> 0.21, which is buggy in other aspects. So, no stable released version >>> has a working append() call. >>> >>> In truth I've never seen a _good_ use case for >>> append-to-an-existing-file. Usually you can do just as well by keeping >>> the file open and periodically hflushing, or rolling to a new file >>> when you want to add more records to an existing dataset. >>> >>> -Todd >>> >>>>> From: t...@cloudera.com >>>>> Date: Wed, 7 Sep 2011 14:17:10 -0700 >>>>> Subject: Re: Question about hdfs close * hflush behavior >>>>> To: hdfs-user@hadoop.apache.orgSend >>>>> >>>>> 2011/9/7 kang hua <kanghua...@msn.com>: >>>>>> >>>>>> Hi friends: >>>>>> I has two question. >>>>>> first one is: >>>>>> I use libhdfs's hflush to flush my data to a file, in same process >>>>>> context I can read it. But I find that file unchanged if I check from >>>>>> hadoop >>>>>> shell ---- it's len is zero( check by "hadoop fs -ls xxx" or read it in >>>>>> program); however when I reboot hdfs, I can read that file's content >>>>>> that I >>>>>> flushed again。 why ? >>>>> >>>>> If we were to update th e file metadata on hflush, it would be very >>>>> expensive, since the metadata lives in the NameNode. >>>>> >>>>> If you do hadoop fs -cat xxx, you should see the entirety of the flushed >>>>> data. >>>>> >>>>>> can I hflush data to file without close it,at same time read data >>>>>> flushed >>>>>> by other process ? >>>>> >>>>> yes. >>>>> >>>> >>>> >>>> >>>> >>>> >>>>>> second one is: >>>>>> does once close hdfs file, the last written block is untouched. even >>>>>> open >>>>>> that file with append mode, namenode will alloc a new block to for >>>>>> append >>>>>> data? >>>>> >>>>> No, it reopens the last block of the existing file for append. >>>>> >>>>>> I find if I close file and open it with append mode again and again. >>>>>> hdfs >>>>>> report will show "used space much more that the file logic size" >>>>> >>>>> Not sure I follow what you mean by this. Can you give more d etail? >>>>> >>>>>> btw: I use cloudera ch2 >>>>> >>>>> The actual "append()" function has some bugs in all of the 0.20 >>>>> releases, including Cloudera's. The hflush/sync() API is fine to use, >>>>> but I would recommend against using append(). >>>>> >>>>> -Todd >>>>> -- >>>>> Todd Lipcon >>>>> Software Engineer, Cloudera >>>> >>> >>> >>> >>> -- >>> Todd Lipcon >>> Software Engineer, Cloudera >>> >> > > > > -- > Todd Lipcon > Software Engineer, Cloudera >