Re: Is hadoop right for my problem

Brian Bockelman Tue, 03 Feb 2009 06:43:14 -0800

Hey Chris,

I think it would be appropriate. Look at it this way, it takes 1mapper 1 minute to process 24k records, so it should take about 17mappers to process all your tasks for the largest problem in one minute.


Even if you still think your problem is too small, consider:

1) The possibility of growth in your application. You're processingbecomes "future proof" - you have a pretty solid way to scale out asyour task grows. Just add new machines -- you don't have to invest ina "small scale" framework then rewrite in a year.2) The benefits of having a framework do the heavy lifting. There's asurprising amount of "roll your own" that you end up doing when youdecide to break out of a single thread. By framing your problem as amap-reduce problem, you get to skip a lot of these steps and justfocus on solving your problem (also: beware that it's very sexy tobuild your own MapReduce framework. Anything which is "very sexy"takes up more time and money than you think possible at the outset).


Brian

On Feb 3, 2009, at 8:34 AM, cdwillie76 wrote:

I have an application I would like to apply hadoop to but I'm notsure if thetasking is too small. I have a file that contains between 70,000 -400,000records. All the records can be processed in parallel and I cancurrentlyprocess them at 400 records a second single threaded (give ortake). Ithought I read somewhere (one of the tutorials) that the mappertasks shouldrun at least for a minute to offset the overhead in creating them.Is thisreally the case? I am pretty sure that a one to one record tomapper isoverkill but I am wondering if I batching them up for the mapper isstill away to go or if I should look at some other framework to help splitup the
processing.

Any insight would be appreciated.

Thanks
Chris
--
View this message in context: 
http://www.nabble.com/Is-hadoop-right-for-my-problem-tp21811122p21811122.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

Re: Is hadoop right for my problem

Reply via email to