I am curious if your data got corrupted when you transferred your file into HDFS? I recently had a very similar situation to yours where I had about 5 lines of decimal points getting corrupted. When I transferred the file back out of HDFS and compared it to the original is when I finally figured out what was wrong. I don't have an answer for your specific question but am just curious if you had experienced the same thing that I did.
________________________________ From: Boyu Zhang <[email protected]> To: [email protected]; [email protected] Sent: Fri, October 15, 2010 5:02:08 PM Subject: Corrupted input data to map Hi all, I am running a program with input 1 million lines of data, among the 1 million, 5 or 6 lines data are corrupted. The way the are corrupted is: in the position which a float number is expected, like 3.4 , instead of a float number, something like this is there: 3.4.5.6 . So when the map runs, it throws a multiple point in num exception. My question is: the map tasks that have the exception are marked failure, how about the data processed by the same map before the exception, do they reach the reduce task? or they are treated like garbage? Thank you very much any help is appreciated. Boyu
