Allen Wittenauer wrote:
On Apr 8, 2010, at 9:37 AM, stephen mulcahy wrote:
When I run this on the Debian 2.6.32 kernel - over the course of the run, 1 or 
2 datanodes of the cluster enter a state whereby they are no longer responsive 
to network traffic.

How much free memory do you have?

Lots, a few GB


How many tasks per node do you have?

I left this at the default.


What are the service times, etc, on your IO system?

Can you clarify this query?


Has anyone run into similar problems with their environments? I noticed that 
the when the nodes become unresponsive, it often happens when the TeraSort is at

I've always seen Linux nodes go unresponsive when they get memory starved to 
the point that the OOM can't function because it can't allocate enough mem.

Sure, but I can login to the unresponsive nodes via the console - it's just the network that has become responsive. To be clear here, I don't suspect Hadoop is the root cause of the problem - I suspect either a kernel bug or some other operating system level bug. I was wondering if others had run into similar problems.

I was also wondering in general what kernel versions and distros people are using, especially for larger production clusters.

Thanks,

-stephen

--
Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
http://di2.deri.ie    http://webstar.deri.ie    http://sindice.com

Reply via email to