A couple of things that can cause a job to take a long time are replicating distributed cache items, and unpacking distributed cache items and otherwise preparing the local task directory on the task trackers. The job jar is a distributed cache item.
On Thu, Jul 2, 2009 at 5:48 PM, Philip Zeyliger <[email protected]> wrote: > You can try to run it via LocalJobRunner ("hadoop jar yourjar -jt > local" if you're using GenericOptionsParser), and see if it exhibits > the same behavior there. It's easy to push that into a debugger > (HADOOP_OPTS="-Xdebug > -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=8020" and > point Eclipse at it) to set some breakpoints and see what's going on. > > Cheers, > > -- Philip > > On Thu, Jul 2, 2009 at 5:22 PM, Amandeep Khurana<[email protected]> wrote: > > How do I figure out whats going on while a job is trying to initialize? I > > have a job thats importing data from a DB into HBase and it takes very > long > > to initialize. The time is enough to cause a time out of the mappers and > > eventually kill the job. > > > > Amandeep > > > > > > Amandeep Khurana > > Computer Science Graduate Student > > University of California, Santa Cruz > > > -- Pro Hadoop, a book to guide you from beginner to hadoop mastery, http://www.amazon.com/dp/1430219424?tag=jewlerymall www.prohadoopbook.com a community for Hadoop Professionals
