I am curious if your data got corrupted when you transferred your file into 
HDFS?  I recently had a very similar situation to yours where I had about 5 
lines of decimal points getting corrupted.  When I transferred the file back 
out 
of HDFS and compared it to the original is when I finally figured out what was 
wrong.  I don't have an answer for your specific question but am just curious 
if 
you had experienced the same thing that I did.




________________________________
From: Boyu Zhang <[email protected]>
To: [email protected]; [email protected]
Sent: Fri, October 15, 2010 5:02:08 PM
Subject: Corrupted input data to map

Hi all,

I am running a program with input 1 million lines of data, among the 1
million, 5 or 6 lines data are corrupted. The way the are corrupted is: in
the position which a float number is expected, like 3.4 , instead of a float
number, something like this is there: 3.4.5.6 . So when the map runs, it
throws a multiple point in num exception.

My question is: the map tasks that have the exception are marked failure,
how about the data processed by the same map before the exception, do they
reach the reduce task? or they are treated like garbage? Thank you very much
any help is appreciated.

Boyu



      

Reply via email to