I do not totally understand you job you are running but if each simulation
can run independent of each other then you could run a map reduce job that
will spread the simulation's over many servers so each one can run one or
more at the same time this will give you a level of protection on servers
going down and take care of the work on spreading out the work to server
also this should be able to handle more then the 100K simulation mark you
stated you would like to run. You would just need to write a the input code
to handle splitting the simulations into splits that the MR framework could
work with.
Billy
"Igor Nikolic" <[EMAIL PROTECTED]> wrote in
message news:[EMAIL PROTECTED]
Thank you for your comment, it did confirm my suspicions.
You framed the problem correctly. I will probably invest a bit of time
studying the framework anyway, to see if a rewrite is interesting, since
we hit scaling limitations on our Agent scheduler framework. Our main
computational load is the massive amount of agent reasoning ( think
JbossRules) and inter-agent communication ( they need to sell and buy
stuff to each other) so I am not sure if it is at all possible to break
it down to small tasks, specially if this needs to happen across CPU's,
the latency is going to kill us.
Thanks
igor
John Martyniak wrote:
I am new to Hadoop. So take this information with a grain of salt.
But the power of Hadoop is breaking down big problems into small pieces
and
spreading it across many (thousands) of machines, in effect creating a
massively parallel processing engine.
But in order to take advantage of that functionality you must write your
application to take advantage of it, using the Hadoop frameworks.
So if I understand your dilemma correctly. I do not think that Hadoop
is
for you, unless you want to re-write your app to take advantage of it.
And
I suspect that if you have access to a traditional cluster, that will be
a
better alternative for you.
Hope that this helps some.
-John
On Wed, Jun 25, 2008 at 7:33 AM, Igor Nikolic
<[EMAIL PROTECTED]> wrote:
Hello list
We will be getting access to a cluster soon, and I was wondering whether
this I should use Hadoop ? Or am I better of with the usual batch
schedulers such as ProActive etc ? I am not a CS/CE person, and from
reading
the website I can not get a sense of whether hadoop is for me.
A little background:
We have a relatively large agent based simulation ( 20+ MB jar) that
needs
to be swept across very large parameter spaces. Agents communicate only
within the simulation, so there is no interprocess communication. The
parameter vector is max 20 long , the simulation may take 5-10 minutes
on a
normal desktop and it might return a few mb of raw data. We need
10k-100K
runs, more if possible.
Thanks for advice, even a short yes/no is welcome
Greetings
Igor
--
ir. Igor Nikolic
PhD Researcher
Section Energy & Industry
Faculty of Technology, Policy and Management
Delft University of Technology, The Netherlands
Tel: +31152781135
Email: [EMAIL PROTECTED]
Web: http://www.igornikolic.com
wiki server: http://wiki.tudelft.nl
--
ir. Igor Nikolic
PhD Researcher
Section Energy & Industry
Faculty of Technology, Policy and Management
Delft University of Technology, The Netherlands
Tel: +31152781135
Email: [EMAIL PROTECTED]
Web: http://www.igornikolic.com
wiki server: http://wiki.tudelft.nl