might it be a synchronization problem? i don't know if hadoops DFS magically takes care of that, but if it doesn't then you might have a problem because of multiple processes trying to write to the same file?

perhaps as a control experiment you could run your process on some small input, making sure that each reduce task outputs to a different filename (i just use Math.random()*Integer.MAX_VALUE and cross my fingers).
On Jun 18, 2008, at 6:01 PM, 晋光峰 wrote:

i'm sure i close all the files in the reduce step. Any other reasons cause
this problem?

2008/6/18 Konstantin Shvachko <[EMAIL PROTECTED]>:

Did you close those files?
If not they may be empty.



??? wrote:

Dears,

I use hadoop-0.16.4 to do some work and found a error which i can't get
the
reasons.

The scenario is like this: In the reduce step, instead of using
OutputCollector to write result, i use FSDataOutputStream to write result
to
files on HDFS(becouse i want to split the result by some rules). After the job finished, i found that *some* files(but not all) are empty on HDFS.
But
i'm sure in the reduce step the files are not empty since i added some
logs
to read the generated file. It seems that some file's contents are lost after the reduce step. Is anyone happen to face such errors? or it's a
hadoop bug?

Please help me to find the reason if you some guys know

Thanks & Regards
Guangfeng




--
Guangfeng Jin

Software Engineer

iZENEsoft (Shanghai) Co., Ltd
Room 601 Marine Tower, No. 1 Pudong Ave.
Tel:86-21-68860698
Fax:86-21-68860699
Mobile: 86-13621906422
Company Website:www.izenesoft.com

Reply via email to