Forgot reducer :) ---------- Forwarded message ---------- From: jamal sasha <[email protected]> Date: Mon, Nov 19, 2012 at 8:17 PM Subject: debugging hadoop streaming programs (first code) To: [email protected]
Hi, This is my first attempt to learn the map reduce abstraction. My problem is as follows I have a text file as follows: id 1, id2, date,time,mrps,code,code2 3710100022400,1350219887, 2011-09-10, 12:39:38.000, 99.00, 1, 0 3710100022400, 5045462785, 2011-09-06, 13:23:00.000, 70.63, 1, 0 Now what I want is to do is to count the number of transaction happening in every half an hour between 7 am and 11 am. So here are the intervals. 7-7:30 ->0 7:30-8 -> 1 8-8:30->2 .... 10:30-11->7 So ultimately what I am doing is creating a 2d dictionary d[id2][interval] = count_transactions. My mappers and reducers are attached (sample input also). The code run just fine if i run via cat input.txt | python mapper.py | sort | python reducer.py Gives me the output but when i run it on clusters.. it throws an error which is not helpful (basically on the terminal it says job unsuccesful reason NA). Any suggestion on what am i doing wrong. Jamal
3710100022400, 1350219887, 2011-09-10, 12:39:38.000, 99.00, 1, 0 3710100022400, 5045462785, 2011-09-06, 13:23:00.000, 70.63, 1, 0 3710100033700, 6312910037, 2011-09-08, 00:23:51.000, 39.95, 0, 0 3710100033700, 2206704868, 2011-09-06, 09:13:41.000, 62.55, 0, 0 3710100033700, 2185901683, 2011-09-05, 13:57:30.000, 180.16, 0, 0 3710100033700, 6310746201, 2011-09-09, 08:27:57.000, 176.78, 1, 0 3710100048800, 3480013846, 2011-09-08, 19:05:17.000, 17.00, 0, 0 3710100048800, 1420938987, 2011-09-10, 18:47:50.000, 99.00, 1, 0 3710100048801, 4455703082, 2011-09-06, 13:24:58.000, 42.01, 1, 0 3710100048801, 4452115801, 2011-09-11, 09:09:45.000, 25.17, 1, 0 3710100048801, 4452115801, 2011-09-11, 09:15:21.000, 7.88, 1, 0 3710100048801, 4450426010, 2011-09-10, 07:12:35.000, 16.85, 1, 0
