Deadlock when mapping a table?

Joost Ouwerkerk Sat, 10 Apr 2010 16:39:10 -0700

We're experiencing a recurring problem and wondering if others have seen
this:


We're mapping a table with about 2 million rows in 100 regions on 40 nodes.
In each map, we're doing a random read on the same table.  We're
encountering a situation that looks alot like deadlock.  When the job is
launched, some of the tasktrackers appear to get blocked in doing the first
random read.  The only trace we get is an eventual Unknown Scanner Exception
in the RegionServer log, at which point the task is actually reported as
successfully completed by MapReduce (1 row processed).  There is no error in
the task's log.  The job completes as SUCCESSFUL with an incomplete number
of rows.  In the worst case scenario, we've actually seen ALL the
tasktrackers encounter this problem; the job completes succesfully with 100
rows processed (1 per region).

When we remove the code that does the random read in the map, there are no
problems.

Anyone?  This is driving me crazy because I can't reproduce it locally (it
only seems to be a problem in a distributed environment with many nodes) and
because there is no stacktrace besides the scanner exception (which is
clearly a symptom, not a cause).

j

Deadlock when mapping a table?

Reply via email to