might it be a synchronization problem? i don't know if hadoops DFS
magically takes care of that, but if it doesn't then you might have a
problem because of multiple processes trying to write to the same file?
perhaps as a control experiment you could run your process on some
small input, making sure that each reduce task outputs to a different
filename (i just use Math.random()*Integer.MAX_VALUE and cross my
fingers).
On Jun 18, 2008, at 6:01 PM, 晋光峰 wrote:
i'm sure i close all the files in the reduce step. Any other reasons
cause
this problem?
2008/6/18 Konstantin Shvachko <[EMAIL PROTECTED]>:
Did you close those files?
If not they may be empty.
??? wrote:
Dears,
I use hadoop-0.16.4 to do some work and found a error which i
can't get
the
reasons.
The scenario is like this: In the reduce step, instead of using
OutputCollector to write result, i use FSDataOutputStream to write
result
to
files on HDFS(becouse i want to split the result by some rules).
After the
job finished, i found that *some* files(but not all) are empty on
HDFS.
But
i'm sure in the reduce step the files are not empty since i added
some
logs
to read the generated file. It seems that some file's contents are
lost
after the reduce step. Is anyone happen to face such errors? or
it's a
hadoop bug?
Please help me to find the reason if you some guys know
Thanks & Regards
Guangfeng
--
Guangfeng Jin
Software Engineer
iZENEsoft (Shanghai) Co., Ltd
Room 601 Marine Tower, No. 1 Pudong Ave.
Tel:86-21-68860698
Fax:86-21-68860699
Mobile: 86-13621906422
Company Website:www.izenesoft.com