Re: Reusing jobs

Jason Venner Fri, 18 Apr 2008 08:55:37 -0700

When there are non daemon threads, JMX threads being our #1 cause, thejvm will not exit with out help.


This is in TaskTracker.java,


in 0.16.0, this is line 2088, in the finally clause of Child.main

       LogManager.shutdown();

System.exit( 0 ); // Force the jvm to exit even if it hasthreads still running, this prevents memory expensive jvms being left around



Devaraj Das wrote:

Jason, didn't get that. The jvm should exit naturally even without calling
System.exit. Where exactly did you insert the System.exit?  Please clarify.
Thanks!
-----Original Message-----
From: Jason Venner [mailto:[EMAIL PROTECTED]Sent: Friday, April 18, 2008 6:48 PM
To: core-user@hadoop.apache.org
Subject: Re: Reusing jobs
We have terrible issues with threads in the JVM's holdingdown resources and causing the compute nodes to run out ofmemory and lock up. We in fact patch the JobTracker to causethe mapper/reduce jvm to System.exit, to ensure that theresources are freed.
This is particularly a problem for mapper/reducers thatenable jmx or spool off many threads for internal processing.
Our solution is to tune the input split size so that theminimum mapper time is > 1 minute
Karl Wettin wrote:
Ted Dunning skrev:
Hadoop has enormous startup costs that are relatively
inherent in the
current design.
Most notably, mappers and reducers are executed in a
standalone JVM
(ostensibly for safety reasons).
Is it possible to hack in support to reuse JVMs? Keep it
alive until
timed out and have it execute the jobs by opening a socket and sayhello? What classes should I start looking in? Could be a
fun exercise.
          karl
On 4/17/08 6:00 PM, "Karl Wettin" <[EMAIL PROTECTED]> wrote:
Is it possible to execute a job more than once?
I use map reduce when adding a new instance to a
hierarchial cluster
tree. It finds the least distant node and inserts the new
instance
as a sibling to that node.
As far as I know it is in very the nature of this
algorithm that one
inserts one instance at a time, that this is how the seconddimension is created that makes it better than a vector
cluster. It
would be possible to map all permutations of instances
and skip the
reduction, but that would result in many more calulations thaniteratively training the tree as the latter only require
one to test
against the instances already inserted to the tree.
Iteratively training this tree using Hadoop means
executing one job
per instance that measure distance to all instances in a
file that I
also append the new instance to once inserted in the tree.
All of above is very inefficient, especially with a young
tree that
could be trained in nanoseconds locally. So I do that
until it takes
20 seconds to insert an instance.
But really, this is all Hadoop framework overhead. I'm not quitesure of all it does when I execute a job, but it seems
like quite a
lot. And all I'm doing is executing a couple of identical
jobs over
and over again using new data.
It would be very nice if I it just took a few
milliseconds to do that.
       karl

--
Jason Venner
Attributor - Publish with Confidence <http://www.attributor.com/>
Attributor is hiring Hadoop Wranglers, contact if interested

Re: Reusing jobs

Reply via email to