I have an application I would like to apply hadoop to but I'm not sure if the tasking is too small. I have a file that contains between 70,000 - 400,000 records. All the records can be processed in parallel and I can currently process them at 400 records a second single threaded (give or take). I thought I read somewhere (one of the tutorials) that the mapper tasks should run at least for a minute to offset the overhead in creating them. Is this really the case? I am pretty sure that a one to one record to mapper is overkill but I am wondering if I batching them up for the mapper is still a way to go or if I should look at some other framework to help split up the processing.
Any insight would be appreciated. Thanks Chris -- View this message in context: http://www.nabble.com/Is-hadoop-right-for-my-problem-tp21811122p21811122.html Sent from the Hadoop core-user mailing list archive at Nabble.com.