RE: Name Node HA (HADOOP-4539)

2009-06-22 Thread Brian.Levine
If the BackupNode doesn't promise HA, then how would additional testing on this 
feature aid in the HA story?  Maybe you could expand on the purpose of 
HADOOP-4539 because now I'm confused.

How does the approaching 0.21 cutoff translate into a release date for 0.21?

-brian
 

-Original Message-
From: ext Steve Loughran [mailto:ste...@apache.org] 
Sent: Monday, June 22, 2009 5:36 AM
To: core-user@hadoop.apache.org
Subject: Re: Name Node HA (HADOOP-4539)

Andrew Wharton wrote:
 https://issues.apache.org/jira/browse/HADOOP-4539
 
 I am curious about the state of this fix. It is listed as
 Incompatible, but is resolved and committed (according to the
 comments). Is the backup name node going to make it into 0.21? Will it
 remove the SPOF for HDFS? And if so, what is the proposed release
 timeline for 0.21?
 
 

The way to deal with HA -which the BackupNode doesn't promise- is to get 
involved in developing and testing the leading edge source tree.

The 0.21 cutoff is approaching, BackupNode is in there, but it needs a 
lot more tests. If you want to aid the development, helping to get more 
automated BackupNode tests in there (indeed, tests that simulate more 
complex NN failures, like a corrupt EditLog) would go a long way.

-steve


RE: Too many open files error, which gets resolved after some time

2009-06-21 Thread Brian.Levine
IMHO, you should never rely on finalizers to release scarce resources since you 
don't know when the finalizer will get called, if ever.

-brian

 

-Original Message-
From: ext jason hadoop [mailto:jason.had...@gmail.com] 
Sent: Sunday, June 21, 2009 11:19 AM
To: core-user@hadoop.apache.org
Subject: Re: Too many open files error, which gets resolved after some time

HDFS/DFS client uses quite a few file descriptors for each open file.

Many application developers (but not the hadoop core) rely on the JVM
finalizer methods to close open files.

This combination, expecially when many HDFS files are open can result in
very large demands for file descriptors for Hadoop clients.
We as a general rule never run a cluster with nofile less that 64k, and for
larger clusters with demanding applications have had it set 10x higher. I
also believe there was a set of JVM versions that leaked file descriptors
used for NIO in the HDFS core. I do not recall the exact details.

On Sun, Jun 21, 2009 at 5:27 AM, Stas Oskin stas.os...@gmail.com wrote:

 Hi.

 After tracing some more with the lsof utility, and I managed to stop the
 growth on the DataNode process, but still have issues with my DFS client.

 It seems that my DFS client opens hundreds of pipes and eventpolls. Here is
 a small part of the lsof output:

 java10508 root  387w  FIFO0,6   6142565 pipe
 java10508 root  388r  FIFO0,6   6142565 pipe
 java10508 root  389u     0,100  6142566
 eventpoll
 java10508 root  390u  FIFO0,6   6135311 pipe
 java10508 root  391r  FIFO0,6   6135311 pipe
 java10508 root  392u     0,100  6135312
 eventpoll
 java10508 root  393r  FIFO0,6   6148234 pipe
 java10508 root  394w  FIFO0,6   6142570 pipe
 java10508 root  395r  FIFO0,6   6135857 pipe
 java10508 root  396r  FIFO0,6   6142570 pipe
 java10508 root  397r     0,100  6142571
 eventpoll
 java10508 root  398u  FIFO0,6   6135319 pipe
 java10508 root  399w  FIFO0,6   6135319 pipe

 I'm using FSDataInputStream and FSDataOutputStream, so this might be
 related
 to pipes?

 So, my questions are:

 1) What happens these pipes/epolls to appear?

 2) More important, how I can prevent their accumation and growth?

 Thanks in advance!

 2009/6/21 Stas Oskin stas.os...@gmail.com

  Hi.
 
  I have HDFS client and HDFS datanode running on same machine.
 
  When I'm trying to access a dozen of files at once from the client,
 several
  times in a row, I'm starting to receive the following errors on client,
 and
  HDFS browse function.
 
  HDFS Client: Could not get block locations. Aborting...
  HDFS browse: Too many open files
 
  I can increase the maximum number of files that can opened, as I have it
  set to the default 1024, but would like to first solve the problem, as
  larger value just means it would run out of files again later on.
 
  So my questions are:
 
  1) Does the HDFS datanode keeps any files opened, even after the HDFS
  client have already closed them?
 
  2) Is it possible to find out, who keeps the opened files - datanode or
  client (so I could pin-point the source of the problem).
 
  Thanks in advance!
 




-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals