Re: Question about hdfs close * hflush behavior

Kanghua151 Thu, 08 Sep 2011 23:55:36 -0700

I get it 。3x

发自我的 iPhone


在 2011-9-8，23:57，Todd Lipcon <t...@cloudera.com> 写道：

> 2011/9/8 Kanghua151 <kanghua...@msn.com>:
>> you are so nice，thank you very much：）
>> last question：
>> can i trigger block sync without restart hdfs？
> 
> Close the file or have a machine crash :) But no, not really.
> 
>> 
>> 
>> 发自我的 iPhone
>> 
>> 在 2011-9-8，15:00，Todd Lipcon <t...@cloudera.com> 写道：
>> 
>>> 2011/9/7 kang hua <kanghua...@msn.com>:
>>>> Thanks my friend!
>>>> please allow me to ask more question about detail thinks!
>>>> 1 yes, I can use hadoop fs -tail or -cat xxx to see that file content, But
>>>> how can I get that file real size in other process if namenode is not 
>>>> change
>>>> ?  I real want is to read the date in tail  of that file.
>>> 
>>> You can open the file and then use an API on the DFSInputStream class
>>> to find the length. I don't recall the name of the API, but if you
>>> look in there, you should see it.
>>> 
>>>> 
>>>> 2 why "when I reboot hdfs, I can see that file's content that I flushed
>>>> again by "hadoop fs -ls xxx" "
>>> 
>>> On restart, the namenode triggers block synchronization, and the
>>> up-to-date length is determined.
>>> 
>>>> 3 In append mode.  close file and open it with append mode again and again 
>>>> .
>>>> real dataspace is normally increase, but nodename  show dfs used space
>>>> increase to fast. it is a bug ?
>>> 
>>> Might be a bug, yes.
>>> 
>>>> 4 which version of hdfs that append is no bug ?
>>> 
>>> 0.21, which is buggy in other aspects. So, no stable released version
>>> has a working append() call.
>>> 
>>> In truth I've never seen a _good_ use case for
>>> append-to-an-existing-file. Usually you can do just as well by keeping
>>> the file open and periodically hflushing, or rolling to a new file
>>> when you want to add more records to an existing dataset.
>>> 
>>> -Todd
>>> 
>>>>> From: t...@cloudera.com
>>>>> Date: Wed, 7 Sep 2011 14:17:10 -0700
>>>>> Subject: Re: Question about hdfs close * hflush behavior
>>>>> To: hdfs-user@hadoop.apache.orgSend
>>>>> 
>>>>> 2011/9/7 kang hua <kanghua...@msn.com>:
>>>>>> 
>>>>>> Hi friends:
>>>>>> I has two question.
>>>>>> first one is:
>>>>>> I use libhdfs's hflush to flush my data to a file, in same process
>>>>>> context I can read it. But I find that file unchanged if I check from
>>>>>> hadoop
>>>>>> shell ---- it's len is zero( check by "hadoop fs -ls xxx" or read it in
>>>>>> program); however when I reboot hdfs, I can read that file's content
>>>>>> that I
>>>>>> flushed again。 why ?
>>>>> 
>>>>> If we were to update th e file metadata on hflush, it would be very
>>>>> expensive, since the metadata lives in the NameNode.
>>>>> 
>>>>> If you do hadoop fs -cat xxx, you should see the entirety of the flushed
>>>>> data.
>>>>> 
>>>>>> can I hflush data to file without close it,at same time read data
>>>>>> flushed
>>>>>> by other process ？
>>>>> 
>>>>> yes.
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>>>> second one is:
>>>>>> does once close hdfs file, the last written block is untouched. even
>>>>>> open
>>>>>> that file with append mode, namenode will alloc a new block to for
>>>>>> append
>>>>>> data?
>>>>> 
>>>>> No, it reopens the last block of the existing file for append.
>>>>> 
>>>>>> I find if I close file and open it with append mode again and again.
>>>>>> hdfs
>>>>>> report will show "used space much more that the file logic size"
>>>>> 
>>>>> Not sure I follow what you mean by this. Can you give more d etail?
>>>>> 
>>>>>> btw: I use cloudera ch2
>>>>> 
>>>>> The actual "append()" function has some bugs in all of the 0.20
>>>>> releases, including Cloudera's. The hflush/sync() API is fine to use,
>>>>> but I would recommend against using append().
>>>>> 
>>>>> -Todd
>>>>> --
>>>>> Todd Lipcon
>>>>> Software Engineer, Cloudera
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Todd Lipcon
>>> Software Engineer, Cloudera
>>> 
>> 
> 
> 
> 
> -- 
> Todd Lipcon
> Software Engineer, Cloudera
>

Re: Question about hdfs close * hflush behavior

Reply via email to