Re: Is Hadoop the thing for us ?

Deyaa Adranale Wed, 25 Jun 2008 07:08:19 -0700

here is some informal description of the map/reduce model:

In the map/reduce paradigm there is usually input data consiting of(very large number of) records.the paradigm assumes that you want to do some computation on each inputrecord seperately (without simultenous access to other records) toproduce some result (the map function). Then the results from the wholerecords are grouped (based on a key) and each group of results can befuther processed (the reduce function) together to produce a finalresult for each group.

Also, global parameters could be made visible to the map function.

so you have to try to model your problem as this model, and if it ispossible, then you can rewrite your porgram or use hadoop native libraries


regards,

Deyaa


Igor Nikolic wrote:

Thank you for your comment, it did confirm my suspicions.
You framed the problem correctly. I will probably invest a bit of timestudying the framework anyway, to see if a rewrite is interesting,since we hit scaling limitations on our Agent scheduler framework. Ourmain computational load is the massive amount of agent reasoning (think JbossRules) and inter-agent communication ( they need to selland buy stuff to each other) so I am not sure if it is at allpossible to break it down to small tasks, specially if this needs tohappen across CPU's, the latency is going to kill us.
Thanks
igor

John Martyniak wrote:
I am new to Hadoop.  So take this information with a grain of salt.
But the power of Hadoop is breaking down big problems into smallpieces and
spreading it across many (thousands) of machines, in effect creating a
massively parallel processing engine.

But in order to take advantage of that functionality you must write your
application to take advantage of it, using the Hadoop frameworks.
So if I understand your dilemma correctly. I do not think thatHadoop isfor you, unless you want to re-write your app to take advantage ofit. AndI suspect that if you have access to a traditional cluster, that willbe a
better alternative for you.

Hope that this helps some.

-John
On Wed, Jun 25, 2008 at 7:33 AM, Igor Nikolic <[EMAIL PROTECTED]>wrote:
Hello list
We will be getting access to a cluster soon, and I was wonderingwhether
this I should use Hadoop ?  Or am I better of with the usual batch
schedulers such as ProActive etc ? I am not a CS/CE person, and fromreading
the website I can not get a sense of whether hadoop is for me.

A little background:
We have a relatively large agent based simulation ( 20+ MB jar)that needs
to be swept across very large parameter spaces. Agents communicate only
within the simulation, so there is no interprocess communication. The
parameter vector is max 20 long , the simulation may take 5-10minutes on anormal desktop and it might return a few mb of raw data. We need10k-100K
runs, more if possible.



Thanks for advice, even a short yes/no is welcome

Greetings
Igor

--
ir. Igor Nikolic
PhD Researcher
Section Energy & Industry
Faculty of Technology, Policy and Management
Delft University of Technology, The Netherlands

Tel: +31152781135
Email: [EMAIL PROTECTED]
Web: http://www.igornikolic.com
wiki server: http://wiki.tudelft.nl

Re: Is Hadoop the thing for us ?

Reply via email to