IMHO, you should never rely on finalizers to release scarce resources since you don't know when the finalizer will get called, if ever.
-brian -----Original Message----- From: ext jason hadoop [mailto:jason.had...@gmail.com] Sent: Sunday, June 21, 2009 11:19 AM To: core-user@hadoop.apache.org Subject: Re: "Too many open files" error, which gets resolved after some time HDFS/DFS client uses quite a few file descriptors for each open file. Many application developers (but not the hadoop core) rely on the JVM finalizer methods to close open files. This combination, expecially when many HDFS files are open can result in very large demands for file descriptors for Hadoop clients. We as a general rule never run a cluster with nofile less that 64k, and for larger clusters with demanding applications have had it set 10x higher. I also believe there was a set of JVM versions that leaked file descriptors used for NIO in the HDFS core. I do not recall the exact details. On Sun, Jun 21, 2009 at 5:27 AM, Stas Oskin <stas.os...@gmail.com> wrote: > Hi. > > After tracing some more with the lsof utility, and I managed to stop the > growth on the DataNode process, but still have issues with my DFS client. > > It seems that my DFS client opens hundreds of pipes and eventpolls. Here is > a small part of the lsof output: > > java 10508 root 387w FIFO 0,6 6142565 pipe > java 10508 root 388r FIFO 0,6 6142565 pipe > java 10508 root 389u 0000 0,10 0 6142566 > eventpoll > java 10508 root 390u FIFO 0,6 6135311 pipe > java 10508 root 391r FIFO 0,6 6135311 pipe > java 10508 root 392u 0000 0,10 0 6135312 > eventpoll > java 10508 root 393r FIFO 0,6 6148234 pipe > java 10508 root 394w FIFO 0,6 6142570 pipe > java 10508 root 395r FIFO 0,6 6135857 pipe > java 10508 root 396r FIFO 0,6 6142570 pipe > java 10508 root 397r 0000 0,10 0 6142571 > eventpoll > java 10508 root 398u FIFO 0,6 6135319 pipe > java 10508 root 399w FIFO 0,6 6135319 pipe > > I'm using FSDataInputStream and FSDataOutputStream, so this might be > related > to pipes? > > So, my questions are: > > 1) What happens these pipes/epolls to appear? > > 2) More important, how I can prevent their accumation and growth? > > Thanks in advance! > > 2009/6/21 Stas Oskin <stas.os...@gmail.com> > > > Hi. > > > > I have HDFS client and HDFS datanode running on same machine. > > > > When I'm trying to access a dozen of files at once from the client, > several > > times in a row, I'm starting to receive the following errors on client, > and > > HDFS browse function. > > > > HDFS Client: "Could not get block locations. Aborting..." > > HDFS browse: "Too many open files" > > > > I can increase the maximum number of files that can opened, as I have it > > set to the default 1024, but would like to first solve the problem, as > > larger value just means it would run out of files again later on. > > > > So my questions are: > > > > 1) Does the HDFS datanode keeps any files opened, even after the HDFS > > client have already closed them? > > > > 2) Is it possible to find out, who keeps the opened files - datanode or > > client (so I could pin-point the source of the problem). > > > > Thanks in advance! > > > -- Pro Hadoop, a book to guide you from beginner to hadoop mastery, http://www.amazon.com/dp/1430219424?tag=jewlerymall www.prohadoopbook.com a community for Hadoop Professionals