Andrzej gave me a lot of help when he pointed me toward the kill -SIGQUIT [pid] command line function. This will write a java thread dump to stdout (which is caught in logs/userlogs/[task]/stdout/part-######). This is a lifesaver if you're getting caught anywhere and not sure why.
--Ned On 10/8/07, Ming Yang <[EMAIL PROTECTED]> wrote: > Hi, > > I have set up 2-node cluster running on Ubuntu 7.04 > and tested the examples, including wordcount and pi. > But the jobs don't always finish. Sometimes the reduce > tasks hang in the middle, such as 13%, and there's no > network traffic between nodes and no CPU usage. > I have been trying all different ways to make it more stable > but no luck. I checked the DFS and found all blocks are > under-replicated. Is this the cause of it? I really appreciate > anyone who can share some experience in this type of > problem. Thank you! > > Ming Yang >
