I'm having a similar problem but with the hadoop CLI tool (not
programatically), and it's driving me nuts:

[EMAIL PROTECTED]:~/nutch/trunk$ cat urls/urls.txt
http://escert.upc.edu/

[EMAIL PROTECTED]:~/nutch/trunk$ bin/hadoop dfs -ls
Found 0 items
[EMAIL PROTECTED]:~/nutch/trunk$ bin/hadoop dfs -put urls urls

[EMAIL PROTECTED]:~/nutch/trunk$ bin/hadoop dfs -ls
Found 1 items
/user/hadoop/urls       <dir>           2008-06-26 17:20        rwxr-xr-x       
hadoop  supergroup
[EMAIL PROTECTED]:~/nutch/trunk$ bin/hadoop dfs -ls urls
Found 1 items
/user/hadoop/urls/urls.txt      <r 1>   0       2008-06-26 17:20        
rw-r--r--       hadoop  supergroup

[EMAIL PROTECTED]:~/nutch/trunk$ bin/hadoop dfs -cat urls/urls.txt
[EMAIL PROTECTED]:~/nutch/trunk$ bin/hadoop dfs -get urls/urls.txt .
[EMAIL PROTECTED]:~/nutch/trunk$ cat urls.txt
[EMAIL PROTECTED]:~/nutch/trunk$

As you see, I put a txt file on HDFS from local, containing a line,
but afterwards, this file is empty... amb I missing any "close",
"flush" or "commit" command ?

Thanks in advance,
Roman

On Thu, Jun 19, 2008 at 7:23 PM, Mori Bellamy <[EMAIL PROTECTED]> wrote:
> might it be a synchronization problem? i don't know if hadoops DFS magically
> takes care of that, but if it doesn't then you might have a problem because
> of multiple processes trying to write to the same file?
>
> perhaps as a control experiment you could run your process on some small
> input, making sure that each reduce task outputs to a different filename (i
> just use Math.random()*Integer.MAX_VALUE and cross my fingers).
> On Jun 18, 2008, at 6:01 PM, 晋光峰 wrote:
>
>> i'm sure i close all the files in the reduce step. Any other reasons cause
>> this problem?
>>
>> 2008/6/18 Konstantin Shvachko <[EMAIL PROTECTED]>:
>>
>>> Did you close those files?
>>> If not they may be empty.
>>>
>>>
>>>
>>> ??? wrote:
>>>
>>>> Dears,
>>>>
>>>> I use hadoop-0.16.4 to do some work and found a error which i can't get
>>>> the
>>>> reasons.
>>>>
>>>> The scenario is like this: In the reduce step, instead of using
>>>> OutputCollector to write result, i use FSDataOutputStream to write
>>>> result
>>>> to
>>>> files on HDFS(becouse i want to split the result by some rules). After
>>>> the
>>>> job finished, i found that *some* files(but not all) are empty on HDFS.
>>>> But
>>>> i'm sure in the reduce step the files are not empty since i added some
>>>> logs
>>>> to read the generated file. It seems that some file's contents are lost
>>>> after the reduce step. Is anyone happen to face such errors? or it's a
>>>> hadoop bug?
>>>>
>>>> Please help me to find the reason if you some guys know
>>>>
>>>> Thanks & Regards
>>>> Guangfeng
>>>>
>>>>
>>
>>
>> --
>> Guangfeng Jin
>>
>> Software Engineer
>>
>> iZENEsoft (Shanghai) Co., Ltd
>> Room 601 Marine Tower, No. 1 Pudong Ave.
>> Tel:86-21-68860698
>> Fax:86-21-68860699
>> Mobile: 86-13621906422
>> Company Website:www.izenesoft.com
>
>

Reply via email to