2011/9/8 Kanghua151 <kanghua...@msn.com>: > you are so nice,thank you very much:) > last question: > can i trigger block sync without restart hdfs?
Close the file or have a machine crash :) But no, not really. > > > 发自我的 iPhone > > 在 2011-9-8,15:00,Todd Lipcon <t...@cloudera.com> 写道: > >> 2011/9/7 kang hua <kanghua...@msn.com>: >>> Thanks my friend! >>> please allow me to ask more question about detail thinks! >>> 1 yes, I can use hadoop fs -tail or -cat xxx to see that file content, But >>> how can I get that file real size in other process if namenode is not change >>> ? I real want is to read the date in tail of that file. >> >> You can open the file and then use an API on the DFSInputStream class >> to find the length. I don't recall the name of the API, but if you >> look in there, you should see it. >> >>> >>> 2 why "when I reboot hdfs, I can see that file's content that I flushed >>> again by "hadoop fs -ls xxx" " >> >> On restart, the namenode triggers block synchronization, and the >> up-to-date length is determined. >> >>> 3 In append mode. close file and open it with append mode again and again . >>> real dataspace is normally increase, but nodename show dfs used space >>> increase to fast. it is a bug ? >> >> Might be a bug, yes. >> >>> 4 which version of hdfs that append is no bug ? >> >> 0.21, which is buggy in other aspects. So, no stable released version >> has a working append() call. >> >> In truth I've never seen a _good_ use case for >> append-to-an-existing-file. Usually you can do just as well by keeping >> the file open and periodically hflushing, or rolling to a new file >> when you want to add more records to an existing dataset. >> >> -Todd >> >>>> From: t...@cloudera.com >>>> Date: Wed, 7 Sep 2011 14:17:10 -0700 >>>> Subject: Re: Question about hdfs close * hflush behavior >>>> To: hdfs-user@hadoop.apache.orgSend >>>> >>>> 2011/9/7 kang hua <kanghua...@msn.com>: >>>>> >>>>> Hi friends: >>>>> I has two question. >>>>> first one is: >>>>> I use libhdfs's hflush to flush my data to a file, in same process >>>>> context I can read it. But I find that file unchanged if I check from >>>>> hadoop >>>>> shell ---- it's len is zero( check by "hadoop fs -ls xxx" or read it in >>>>> program); however when I reboot hdfs, I can read that file's content >>>>> that I >>>>> flushed again。 why ? >>>> >>>> If we were to update th e file metadata on hflush, it would be very >>>> expensive, since the metadata lives in the NameNode. >>>> >>>> If you do hadoop fs -cat xxx, you should see the entirety of the flushed >>>> data. >>>> >>>>> can I hflush data to file without close it,at same time read data >>>>> flushed >>>>> by other process ? >>>> >>>> yes. >>>> >>> >>> >>> >>> >>> >>>>> second one is: >>>>> does once close hdfs file, the last written block is untouched. even >>>>> open >>>>> that file with append mode, namenode will alloc a new block to for >>>>> append >>>>> data? >>>> >>>> No, it reopens the last block of the existing file for append. >>>> >>>>> I find if I close file and open it with append mode again and again. >>>>> hdfs >>>>> report will show "used space much more that the file logic size" >>>> >>>> Not sure I follow what you mean by this. Can you give more d etail? >>>> >>>>> btw: I use cloudera ch2 >>>> >>>> The actual "append()" function has some bugs in all of the 0.20 >>>> releases, including Cloudera's. The hflush/sync() API is fine to use, >>>> but I would recommend against using append(). >>>> >>>> -Todd >>>> -- >>>> Todd Lipcon >>>> Software Engineer, Cloudera >>> >> >> >> >> -- >> Todd Lipcon >> Software Engineer, Cloudera >> > -- Todd Lipcon Software Engineer, Cloudera