Re: Too many open files
Typically, long lived connections are better, so global. -- Jeff Jirsa > On Jan 22, 2018, at 3:28 AM, Andreou, Arys (Nokia - GR/Athens) > <arys.andr...@nokia.com> wrote: > > It turns out it was a mistake in the client’s implementation. > The session was created for each request but it was shut down, so all the > connections were left open. > I only needed to execute a cluste.shutdown() once the request was over. > > I do have a follow up question though. > Is it better to have a global session object or to create it and shut it down > for every request? > > > From: n...@photonhost.com [mailto:n...@photonhost.com] On Behalf Of Nikolay > Mihaylov > Sent: Monday, January 22, 2018 11:47 AM > To: user@cassandra.apache.org > Subject: Re: Too many open files > > You can increase system open files, > also if you compact, open files will go down. > > On Mon, Jan 22, 2018 at 10:19 AM, Dor Laor <d...@scylladb.com> wrote: > It's a high number, your compaction may run behind and thus > many small sstables exist. However, you're also taking the > number of network connection in the calculation (everything > in *nix is a file). If it makes you feel better my laptop > has 40k open files for Chrome.. > > On Sun, Jan 21, 2018 at 11:59 PM, Andreou, Arys (Nokia - GR/Athens) > <arys.andr...@nokia.com> wrote: > Hi, > > I keep getting a “Last error: Too many open files” followed by a list of node > IPs. > The output of “lsof -n|grep java|wc -l” is about 674970 on each node. > > What is a normal number of open files? > > Thank you. > > >
RE: Too many open files
It turns out it was a mistake in the client’s implementation. The session was created for each request but it was shut down, so all the connections were left open. I only needed to execute a cluste.shutdown() once the request was over. I do have a follow up question though. Is it better to have a global session object or to create it and shut it down for every request? From: n...@photonhost.com [mailto:n...@photonhost.com] On Behalf Of Nikolay Mihaylov Sent: Monday, January 22, 2018 11:47 AM To: user@cassandra.apache.org Subject: Re: Too many open files You can increase system open files, also if you compact, open files will go down. On Mon, Jan 22, 2018 at 10:19 AM, Dor Laor <d...@scylladb.com<mailto:d...@scylladb.com>> wrote: It's a high number, your compaction may run behind and thus many small sstables exist. However, you're also taking the number of network connection in the calculation (everything in *nix is a file). If it makes you feel better my laptop has 40k open files for Chrome.. On Sun, Jan 21, 2018 at 11:59 PM, Andreou, Arys (Nokia - GR/Athens) <arys.andr...@nokia.com<mailto:arys.andr...@nokia.com>> wrote: Hi, I keep getting a “Last error: Too many open files” followed by a list of node IPs. The output of “lsof -n|grep java|wc -l” is about 674970 on each node. What is a normal number of open files? Thank you.
Re: Too many open files
You can increase system open files, also if you compact, open files will go down. On Mon, Jan 22, 2018 at 10:19 AM, Dor Laor <d...@scylladb.com> wrote: > It's a high number, your compaction may run behind and thus > many small sstables exist. However, you're also taking the > number of network connection in the calculation (everything > in *nix is a file). If it makes you feel better my laptop > has 40k open files for Chrome.. > > On Sun, Jan 21, 2018 at 11:59 PM, Andreou, Arys (Nokia - GR/Athens) < > arys.andr...@nokia.com> wrote: > >> Hi, >> >> >> >> I keep getting a “Last error: Too many open files” followed by a list of >> node IPs. >> >> The output of “lsof -n|grep java|wc -l” is about 674970 on each node. >> >> >> >> What is a normal number of open files? >> >> >> >> Thank you. >> >> >> > >
Re: Too many open files
It's a high number, your compaction may run behind and thus many small sstables exist. However, you're also taking the number of network connection in the calculation (everything in *nix is a file). If it makes you feel better my laptop has 40k open files for Chrome.. On Sun, Jan 21, 2018 at 11:59 PM, Andreou, Arys (Nokia - GR/Athens) < arys.andr...@nokia.com> wrote: > Hi, > > > > I keep getting a “Last error: Too many open files” followed by a list of > node IPs. > > The output of “lsof -n|grep java|wc -l” is about 674970 on each node. > > > > What is a normal number of open files? > > > > Thank you. > > >
Too many open files
Hi, I keep getting a "Last error: Too many open files" followed by a list of node IPs. The output of "lsof -n|grep java|wc -l" is about 674970 on each node. What is a normal number of open files? Thank you.
Re: Too many open files Cassandra 2.1.11.872
cat /proc/5980/limits Limit Soft Limit Hard Limit Units Max cpu time unlimitedunlimitedseconds Max file size unlimitedunlimitedbytes Max data size unlimitedunlimitedbytes Max stack size8388608 unlimitedbytes Max core file size0unlimitedbytes Max resident set unlimitedunlimitedbytes Max processes 2063522 2063522 processes Max open files10 10 files Max locked memory unlimitedunlimitedbytes Max address space unlimitedunlimitedbytes Max file locksunlimitedunlimitedlocks Max pending signals 2063522 2063522 signals Max msgqueue size 819200 819200 bytes Max nice priority 00 Max realtime priority 00 Max realtime timeout unlimitedunlimitedus On Fri, Nov 6, 2015 at 4:01 PM, Sebastian Estevez < sebastian.este...@datastax.com> wrote: > You probably need to configure ulimits correctly > <http://docs.datastax.com/en/cassandra/2.0/cassandra/install/installRecommendSettings.html> > . > > What does this give you? > > /proc//limits > > > All the best, > > > [image: datastax_logo.png] <http://www.datastax.com/> > > Sebastián Estévez > > Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com > > [image: linkedin.png] <https://www.linkedin.com/company/datastax> [image: > facebook.png] <https://www.facebook.com/datastax> [image: twitter.png] > <https://twitter.com/datastax> [image: g+.png] > <https://plus.google.com/+Datastax/about> > <http://feeds.feedburner.com/datastax> > <http://goog_410786983> > > > <http://www.datastax.com/gartner-magic-quadrant-odbms> > > DataStax is the fastest, most scalable distributed database technology, > delivering Apache Cassandra to the world’s most innovative enterprises. > Datastax is built to be agile, always-on, and predictably scalable to any > size. With more than 500 customers in 45 countries, DataStax is the > database technology and transactional backbone of choice for the worlds > most innovative companies such as Netflix, Adobe, Intuit, and eBay. > > On Fri, Nov 6, 2015 at 1:56 PM, Branton Davis <branton.da...@spanning.com> > wrote: > >> We recently went down the rabbit hole of trying to understand the output >> of lsof. lsof -n has a lot of duplicates (files opened by multiple >> threads). Use 'lsof -p $PID' or 'lsof -u cassandra' instead. >> >> On Fri, Nov 6, 2015 at 12:49 PM, Bryan Cheng <br...@blockcypher.com> >> wrote: >> >>> Is your compaction progressing as expected? If not, this may cause an >>> excessive number of tiny db files. Had a node refuse to start recently >>> because of this, had to temporarily remove limits on that process. >>> >>> On Fri, Nov 6, 2015 at 10:09 AM, Jason Lewis <jle...@packetnexus.com> >>> wrote: >>> >>>> I'm getting too many open files errors and I'm wondering what the >>>> cause may be. >>>> >>>> lsof -n | grep java show 1.4M files >>>> >>>> ~90k are inodes >>>> ~70k are pipes >>>> ~500k are cassandra services in /usr >>>> ~700K are the data files. >>>> >>>> What might be causing so many files to be open? >>>> >>>> jas >>>> >>> >>> >> >
Re: Re: Too many open files Cassandra 2.1.11.872
many connection ? 郝加来 From: Jason Lewis Date: 2015-11-07 10:38 To: user@cassandra.apache.org Subject: Re: Too many open files Cassandra 2.1.11.872 cat /proc/5980/limits Limit Soft Limit Hard Limit Units Max cpu time unlimitedunlimitedseconds Max file size unlimitedunlimitedbytes Max data size unlimitedunlimitedbytes Max stack size8388608 unlimitedbytes Max core file size0unlimitedbytes Max resident set unlimitedunlimitedbytes Max processes 2063522 2063522 processes Max open files10 10 files Max locked memory unlimitedunlimitedbytes Max address space unlimitedunlimitedbytes Max file locksunlimitedunlimitedlocks Max pending signals 2063522 2063522 signals Max msgqueue size 819200 819200 bytes Max nice priority 00 Max realtime priority 00 Max realtime timeout unlimitedunlimitedus On Fri, Nov 6, 2015 at 4:01 PM, Sebastian Estevez <sebastian.este...@datastax.com> wrote: You probably need to configure ulimits correctly. What does this give you? /proc//limits All the best, Sebastián Estévez Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay. On Fri, Nov 6, 2015 at 1:56 PM, Branton Davis <branton.da...@spanning.com> wrote: We recently went down the rabbit hole of trying to understand the output of lsof. lsof -n has a lot of duplicates (files opened by multiple threads). Use 'lsof -p $PID' or 'lsof -u cassandra' instead. On Fri, Nov 6, 2015 at 12:49 PM, Bryan Cheng <br...@blockcypher.com> wrote: Is your compaction progressing as expected? If not, this may cause an excessive number of tiny db files. Had a node refuse to start recently because of this, had to temporarily remove limits on that process. On Fri, Nov 6, 2015 at 10:09 AM, Jason Lewis <jle...@packetnexus.com> wrote: I'm getting too many open files errors and I'm wondering what the cause may be. lsof -n | grep java show 1.4M files ~90k are inodes ~70k are pipes ~500k are cassandra services in /usr ~700K are the data files. What might be causing so many files to be open? jas --- Confidentiality Notice: The information contained in this e-mail and any accompanying attachment(s) is intended only for the use of the intended recipient and may be confidential and/or privileged of Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of this communication is not the intended recipient, unauthorized use, forwarding, printing, storing, disclosure or copying is strictly prohibited, and may be unlawful.If you have received this communication in error,please immediately notify the sender by return e-mail, and delete the original message and all copies from your system. Thank you. ---
Re: Too many open files Cassandra 2.1.11.872
We recently went down the rabbit hole of trying to understand the output of lsof. lsof -n has a lot of duplicates (files opened by multiple threads). Use 'lsof -p $PID' or 'lsof -u cassandra' instead. On Fri, Nov 6, 2015 at 12:49 PM, Bryan Cheng <br...@blockcypher.com> wrote: > Is your compaction progressing as expected? If not, this may cause an > excessive number of tiny db files. Had a node refuse to start recently > because of this, had to temporarily remove limits on that process. > > On Fri, Nov 6, 2015 at 10:09 AM, Jason Lewis <jle...@packetnexus.com> > wrote: > >> I'm getting too many open files errors and I'm wondering what the >> cause may be. >> >> lsof -n | grep java show 1.4M files >> >> ~90k are inodes >> ~70k are pipes >> ~500k are cassandra services in /usr >> ~700K are the data files. >> >> What might be causing so many files to be open? >> >> jas >> > >
Re: Too many open files Cassandra 2.1.11.872
Is your compaction progressing as expected? If not, this may cause an excessive number of tiny db files. Had a node refuse to start recently because of this, had to temporarily remove limits on that process. On Fri, Nov 6, 2015 at 10:09 AM, Jason Lewis <jle...@packetnexus.com> wrote: > I'm getting too many open files errors and I'm wondering what the > cause may be. > > lsof -n | grep java show 1.4M files > > ~90k are inodes > ~70k are pipes > ~500k are cassandra services in /usr > ~700K are the data files. > > What might be causing so many files to be open? > > jas >
Too many open files Cassandra 2.1.11.872
I'm getting too many open files errors and I'm wondering what the cause may be. lsof -n | grep java show 1.4M files ~90k are inodes ~70k are pipes ~500k are cassandra services in /usr ~700K are the data files. What might be causing so many files to be open? jas
Re: Too many open files Cassandra 2.1.11.872
You probably need to configure ulimits correctly <http://docs.datastax.com/en/cassandra/2.0/cassandra/install/installRecommendSettings.html> . What does this give you? /proc//limits All the best, [image: datastax_logo.png] <http://www.datastax.com/> Sebastián Estévez Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com [image: linkedin.png] <https://www.linkedin.com/company/datastax> [image: facebook.png] <https://www.facebook.com/datastax> [image: twitter.png] <https://twitter.com/datastax> [image: g+.png] <https://plus.google.com/+Datastax/about> <http://feeds.feedburner.com/datastax> <http://goog_410786983> <http://www.datastax.com/gartner-magic-quadrant-odbms> DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay. On Fri, Nov 6, 2015 at 1:56 PM, Branton Davis <branton.da...@spanning.com> wrote: > We recently went down the rabbit hole of trying to understand the output > of lsof. lsof -n has a lot of duplicates (files opened by multiple > threads). Use 'lsof -p $PID' or 'lsof -u cassandra' instead. > > On Fri, Nov 6, 2015 at 12:49 PM, Bryan Cheng <br...@blockcypher.com> > wrote: > >> Is your compaction progressing as expected? If not, this may cause an >> excessive number of tiny db files. Had a node refuse to start recently >> because of this, had to temporarily remove limits on that process. >> >> On Fri, Nov 6, 2015 at 10:09 AM, Jason Lewis <jle...@packetnexus.com> >> wrote: >> >>> I'm getting too many open files errors and I'm wondering what the >>> cause may be. >>> >>> lsof -n | grep java show 1.4M files >>> >>> ~90k are inodes >>> ~70k are pipes >>> ~500k are cassandra services in /usr >>> ~700K are the data files. >>> >>> What might be causing so many files to be open? >>> >>> jas >>> >> >> >
Re: too many open files
Maybe the drivers should have two modes: few sessions, and lots of sessions. The former would give you a developer-friendly driver error if you leave more than say a dozen or two dozen sessions open (or whatever is considered a best practice for parallel threads in a client), on the theory that you probably used the anti-pattern of failing to reuse sessions. The latter would be more for expert apps that have some good reason for having hundreds or thousands of simultaneous sessions open. Whether the latter also has some (configurable) limit that is simply a lot higher than the former or is unlimited, is probably not so important. Or, maybe, simply have a single limit, without the modes and default it to 10 or 25 or some other relatively low number for “normal” apps. This would be more developer-friendly, for both new and “normal” developers... I think. -- Jack Krupansky From: Marcelo Elias Del Valle Sent: Saturday, August 9, 2014 12:41 AM To: user@cassandra.apache.org Subject: Re: too many open files Indeed, that was my mistake, that was exactly what we were doing in the code. []s 2014-08-09 0:56 GMT-03:00 Brian Zhang yikebo...@gmail.com: For cassandra driver,session is just like database connection pool,it maybe contains many tcp connections,if you create a new session every time,more and more tcp connections will be created,till surpass the max file description limit of os. You should create one session,use it repeatedly ,session can manage connections automatically,create new connection or close old connection for your requests. 在 2014年8月9日,6:52,Redmumba redmu...@gmail.com 写道: Just to chime in, I also ran into this issue when I was migrating to the Datastax client. Instead of reusing the session, I was opening a new session each time. For some reason, even though I was still closing the session on the client side, I was getting the same error. Plus, the only way I could recover was by restarting Cassandra. I did not really see the connections timeout over a period of a few minutes. Andrew On Aug 8, 2014 3:19 PM, Andrey Ilinykh ailin...@gmail.com wrote: You may have this problem if your client doesn't reuse the connection but opens new every type. So, run netstat and check the number of established connections. This number should not be big. Thank you, Andrey On Fri, Aug 8, 2014 at 12:35 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: Hi, I am using Cassandra 2.0.9 running on Debian Wheezy, and I am having too many open files exceptions when I try to perform a large number of operations in my 10 node cluster. I saw the documentation http://www.datastax.com/documentation/cassandra/2.0/cassandra/troubleshooting/trblshootTooManyFiles_r.html and I have set everything to the recommended settings, but I keep getting the errors. In the documentation it says: Another, much less likely possibility, is a file descriptor leak in Cassandra. Run lsof -n | grep java to check that the number of file descriptors opened by Java is reasonable and reports the error if the number is greater than a few thousand. I guess it's not the case, or else a lot of people would be complaining about it, but I am not sure what I could do to solve the problem. Any hint about how to solve it? My client is written in python and uses Cassandra Python Driver. Here are the exceptions I am having in the client: [s1log] 2014-08-08 12:16:09,631 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.151, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,632 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.142, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,633 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.143, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,634 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.142, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,634 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.145, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,635 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.144, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,635 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.148, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,732 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.146, scheduling retry in 600.0 seconds: [Errno 24] Too
Re: too many open files
Tyler, I’ll see if I can reproduce this on a local instance, but just in case, the error was basically—instead of storing the session in my connection factory, I stored a cluster and called “connect” each time I requested a Session. I had defined a max/min number of connections for the connection itself, maxing out at 128 local/remote. I’m not sure if a Session results in a new file handle on the server side, but I saw the same issue (hundreds of thousands of sockets opened on the server). The cluster was also using hsha; most of the other settings were default in 2.0.7. Andrew On August 8, 2014 at 4:08:50 PM, Tyler Hobbs (ty...@datastax.com) wrote: On Fri, Aug 8, 2014 at 5:52 PM, Redmumba redmu...@gmail.com wrote: Just to chime in, I also ran into this issue when I was migrating to the Datastax client. Instead of reusing the session, I was opening a new session each time. For some reason, even though I was still closing the session on the client side, I was getting the same error. Which driver? If you can still reproduce this, would you mind opening a ticket? (https://datastax-oss.atlassian.net/secure/BrowseProjects.jspa#all) -- Tyler Hobbs DataStax
Re: too many open files
I just had a generator that (in the incorrect way) had a cluster as a member variable, and would call .connect() repeatedly. I _thought_, incorrectly, that the Session was thread unsafe, and so I should request a separate Session each time—obviously wrong in hind sight. There was no special logic; I had a restriction of about 128 connections per host, but the connections were in the 100s of thousands, like the OP mentioned. Again, I’ll see about reproducing it on Monday, but just wanted the repro steps (overall) to live somewhere in case I can’t. :) Andrew On August 8, 2014 at 4:08:50 PM, Tyler Hobbs (ty...@datastax.com) wrote: On Fri, Aug 8, 2014 at 5:52 PM, Redmumba redmu...@gmail.com wrote: Just to chime in, I also ran into this issue when I was migrating to the Datastax client. Instead of reusing the session, I was opening a new session each time. For some reason, even though I was still closing the session on the client side, I was getting the same error. Which driver? If you can still reproduce this, would you mind opening a ticket? (https://datastax-oss.atlassian.net/secure/BrowseProjects.jspa#all) -- Tyler Hobbs DataStax
Re: too many open files
IMHO, I think the drivers are fine. It was a dumb mistake of mine to use sessions as connections and not as connection pools. What was harder to figure, in my opinion, was that too many connections from the client would increase the amount of file descriptors used by the server. I didn't know Linux open a FD for each connection received and honestly I still don't know much about the details of this. When I got a too many open files error it took a good while to think about checking the connections. I think the documentation could point this fact, it would help other people with the same problem. There could be something talking about it here: http://www.datastax.com/documentation/cassandra/2.0/cassandra/troubleshooting/trblshootTooManyFiles_r.html []s 2014-08-09 12:55 GMT-03:00 Jack Krupansky j...@basetechnology.com: Maybe the drivers should have two modes: few sessions, and lots of sessions. The former would give you a developer-friendly driver error if you leave more than say a dozen or two dozen sessions open (or whatever is considered a best practice for parallel threads in a client), on the theory that you probably used the anti-pattern of failing to reuse sessions. The latter would be more for expert apps that have some good reason for having hundreds or thousands of simultaneous sessions open. Whether the latter also has some (configurable) limit that is simply a lot higher than the former or is unlimited, is probably not so important. Or, maybe, simply have a single limit, without the modes and default it to 10 or 25 or some other relatively low number for “normal” apps. This would be more developer-friendly, for both new and “normal” developers... I think. -- Jack Krupansky *From:* Marcelo Elias Del Valle marc...@s1mbi0se.com.br *Sent:* Saturday, August 9, 2014 12:41 AM *To:* user@cassandra.apache.org *Subject:* Re: too many open files Indeed, that was my mistake, that was exactly what we were doing in the code. []s 2014-08-09 0:56 GMT-03:00 Brian Zhang yikebo...@gmail.com: For cassandra driver,session is just like database connection pool,it maybe contains many tcp connections,if you create a new session every time,more and more tcp connections will be created,till surpass the max file description limit of os. You should create one session,use it repeatedly ,session can manage connections automatically,create new connection or close old connection for your requests. 在 2014年8月9日,6:52,Redmumba redmu...@gmail.com 写道: Just to chime in, I also ran into this issue when I was migrating to the Datastax client. Instead of reusing the session, I was opening a new session each time. For some reason, even though I was still closing the session on the client side, I was getting the same error. Plus, the only way I could recover was by restarting Cassandra. I did not really see the connections timeout over a period of a few minutes. Andrew On Aug 8, 2014 3:19 PM, Andrey Ilinykh ailin...@gmail.com wrote: You may have this problem if your client doesn't reuse the connection but opens new every type. So, run netstat and check the number of established connections. This number should not be big. Thank you, Andrey On Fri, Aug 8, 2014 at 12:35 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: Hi, I am using Cassandra 2.0.9 running on Debian Wheezy, and I am having too many open files exceptions when I try to perform a large number of operations in my 10 node cluster. I saw the documentation http://www.datastax.com/documentation/cassandra/2.0/cassandra/troubleshooting/trblshootTooManyFiles_r.html and I have set everything to the recommended settings, but I keep getting the errors. In the documentation it says: Another, much less likely possibility, is a file descriptor leak in Cassandra. Run lsof -n | grep java to check that the number of file descriptors opened by Java is reasonable and reports the error if the number is greater than a few thousand. I guess it's not the case, or else a lot of people would be complaining about it, but I am not sure what I could do to solve the problem. Any hint about how to solve it? My client is written in python and uses Cassandra Python Driver. Here are the exceptions I am having in the client: [s1log] 2014-08-08 12:16:09,631 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.151, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,632 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.142, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,633 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.143, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,634 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.142, scheduling retry in 600.0 seconds
Re: too many open files
Another idea to detect this is when the number of open sessions exceeds the number of threads. On Aug 9, 2014 10:59 AM, Andrew redmu...@gmail.com wrote: I just had a generator that (in the incorrect way) had a cluster as a member variable, and would call .connect() repeatedly. I _thought_, incorrectly, that the Session was thread unsafe, and so I should request a separate Session each time—obviously wrong in hind sight. There was no special logic; I had a restriction of about 128 connections per host, but the connections were in the 100s of thousands, like the OP mentioned. Again, I’ll see about reproducing it on Monday, but just wanted the repro steps (overall) to live somewhere in case I can’t. :) Andrew On August 8, 2014 at 4:08:50 PM, Tyler Hobbs (ty...@datastax.com) wrote: On Fri, Aug 8, 2014 at 5:52 PM, Redmumba redmu...@gmail.com wrote: Just to chime in, I also ran into this issue when I was migrating to the Datastax client. Instead of reusing the session, I was opening a new session each time. For some reason, even though I was still closing the session on the client side, I was getting the same error. Which driver? If you can still reproduce this, would you mind opening a ticket? (https://datastax-oss.atlassian.net/secure/BrowseProjects.jspa#all ) -- Tyler Hobbs DataStax http://datastax.com/
Re: too many open files
It really doesn't need to be this complicated. You only need 1 session per application. It's thread safe and manages the connection pool for you. http://www.datastax.com/drivers/java/2.0/com/datastax/driver/core/Session.html On Sat, Aug 9, 2014 at 1:29 PM, Kevin Burton bur...@spinn3r.com wrote: Another idea to detect this is when the number of open sessions exceeds the number of threads. On Aug 9, 2014 10:59 AM, Andrew redmu...@gmail.com wrote: I just had a generator that (in the incorrect way) had a cluster as a member variable, and would call .connect() repeatedly. I _thought_, incorrectly, that the Session was thread unsafe, and so I should request a separate Session each time—obviously wrong in hind sight. There was no special logic; I had a restriction of about 128 connections per host, but the connections were in the 100s of thousands, like the OP mentioned. Again, I’ll see about reproducing it on Monday, but just wanted the repro steps (overall) to live somewhere in case I can’t. :) Andrew On August 8, 2014 at 4:08:50 PM, Tyler Hobbs (ty...@datastax.com) wrote: On Fri, Aug 8, 2014 at 5:52 PM, Redmumba redmu...@gmail.com wrote: Just to chime in, I also ran into this issue when I was migrating to the Datastax client. Instead of reusing the session, I was opening a new session each time. For some reason, even though I was still closing the session on the client side, I was getting the same error. Which driver? If you can still reproduce this, would you mind opening a ticket? (https://datastax-oss.atlassian.net/secure/BrowseProjects.jspa#all) -- Tyler Hobbs DataStax -- Jon Haddad http://www.rustyrazorblade.com skype: rustyrazorblade
Re: too many open files
Yes, that was the problem—I actually knew better, but had briefly overlooked this that when I was doing some refactoring. I am not the OP (although he himself realized his mistake). if you follow the thread, I was explaining that the Datastax Java driver allowed me to basically open a significantly large number of connections until the Cassandra server ran out of connections. Tyler was asking for a repro case and requesting that I file a possible bug, if this was something that SHOULD have been caught on the client side (via the max connections client configuration). Andrew On August 9, 2014 at 2:17:57 PM, Jonathan Haddad (j...@jonhaddad.com) wrote: It really doesn't need to be this complicated. You only need 1 session per application. It's thread safe and manages the connection pool for you. http://www.datastax.com/drivers/java/2.0/com/datastax/driver/core/Session.html On Sat, Aug 9, 2014 at 1:29 PM, Kevin Burton bur...@spinn3r.com wrote: Another idea to detect this is when the number of open sessions exceeds the number of threads. On Aug 9, 2014 10:59 AM, Andrew redmu...@gmail.com wrote: I just had a generator that (in the incorrect way) had a cluster as a member variable, and would call .connect() repeatedly. I _thought_, incorrectly, that the Session was thread unsafe, and so I should request a separate Session each time—obviously wrong in hind sight. There was no special logic; I had a restriction of about 128 connections per host, but the connections were in the 100s of thousands, like the OP mentioned. Again, I’ll see about reproducing it on Monday, but just wanted the repro steps (overall) to live somewhere in case I can’t. :) Andrew On August 8, 2014 at 4:08:50 PM, Tyler Hobbs (ty...@datastax.com) wrote: On Fri, Aug 8, 2014 at 5:52 PM, Redmumba redmu...@gmail.com wrote: Just to chime in, I also ran into this issue when I was migrating to the Datastax client. Instead of reusing the session, I was opening a new session each time. For some reason, even though I was still closing the session on the client side, I was getting the same error. Which driver? If you can still reproduce this, would you mind opening a ticket? (https://datastax-oss.atlassian.net/secure/BrowseProjects.jspa#all) -- Tyler Hobbs DataStax -- Jon Haddad http://www.rustyrazorblade.com skype: rustyrazorblade
too many open files
Hi, I am using Cassandra 2.0.9 running on Debian Wheezy, and I am having too many open files exceptions when I try to perform a large number of operations in my 10 node cluster. I saw the documentation http://www.datastax.com/documentation/cassandra/2.0/cassandra/troubleshooting/trblshootTooManyFiles_r.html and I have set everything to the recommended settings, but I keep getting the errors. In the documentation it says: Another, much less likely possibility, is a file descriptor leak in Cassandra. Run lsof -n | grep java to check that the number of file descriptors opened by Java is reasonable and reports the error if the number is greater than a few thousand. I guess it's not the case, or else a lot of people would be complaining about it, but I am not sure what I could do to solve the problem. Any hint about how to solve it? My client is written in python and uses Cassandra Python Driver. Here are the exceptions I am having in the client: [s1log] 2014-08-08 12:16:09,631 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.151, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,632 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.142, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,633 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.143, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,634 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.142, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,634 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.145, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,635 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.144, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,635 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.148, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,732 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.146, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,733 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.77, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,734 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.76, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,734 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.75, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,735 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.142, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,736 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.185, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,942 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.144, scheduling retry in 512.0 seconds: Timed out connecting to 200.200.200.144 [s1log] 2014-08-08 12:16:09,998 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.77, scheduling retry in 512.0 seconds: Timed out connecting to 200.200.200.77 And here is the exception I am having in the server: WARN [Native-Transport-Requests:163] 2014-08-08 14:27:30,499 BatchStatement.java (line 223) Batch of prepared statements for [identification.entity_lookup, identification.entity] is of size 25216, exceeding specified threshold of 5120 by 20096. ERROR [Native-Transport-Requests:150] 2014-08-08 14:27:31,611 ErrorMessage.java (line 222) Unexpected exception during request java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:192) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:375) at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:64) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90) at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142
Re: too many open files
Are you using apache or Datastax cassandra? The datastax distribution ups the file handle limit to 10. That number's hard to exceed. On Fri, Aug 8, 2014 at 1:35 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: Hi, I am using Cassandra 2.0.9 running on Debian Wheezy, and I am having too many open files exceptions when I try to perform a large number of operations in my 10 node cluster. I saw the documentation http://www.datastax.com/documentation/cassandra/2.0/cassandra/troubleshooting/trblshootTooManyFiles_r.html and I have set everything to the recommended settings, but I keep getting the errors. In the documentation it says: Another, much less likely possibility, is a file descriptor leak in Cassandra. Run lsof -n | grep java to check that the number of file descriptors opened by Java is reasonable and reports the error if the number is greater than a few thousand. I guess it's not the case, or else a lot of people would be complaining about it, but I am not sure what I could do to solve the problem. Any hint about how to solve it? My client is written in python and uses Cassandra Python Driver. Here are the exceptions I am having in the client: [s1log] 2014-08-08 12:16:09,631 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.151, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,632 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.142, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,633 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.143, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,634 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.142, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,634 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.145, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,635 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.144, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,635 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.148, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,732 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.146, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,733 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.77, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,734 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.76, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,734 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.75, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,735 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.142, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,736 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.185, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,942 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.144, scheduling retry in 512.0 seconds: Timed out connecting to 200.200.200.144 [s1log] 2014-08-08 12:16:09,998 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.77, scheduling retry in 512.0 seconds: Timed out connecting to 200.200.200.77 And here is the exception I am having in the server: WARN [Native-Transport-Requests:163] 2014-08-08 14:27:30,499 BatchStatement.java (line 223) Batch of prepared statements for [identification.entity_lookup, identification.entity] is of size 25216, exceeding specified threshold of 5120 by 20096. ERROR [Native-Transport-Requests:150] 2014-08-08 14:27:31,611 ErrorMessage.java (line 222) Unexpected exception during request java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:192) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:375) at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:64) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run
Re: too many open files
I am using datastax community, the packaged version for Debian. I am also using last version of opscenter and datastax-agent However, I just listed open files here: sudo lsof -n | grep java | wc -l 1096599 It seems it has exceed. Should I just increase? Or is it possible to be a memory leak? Best regards, Marcelo. 2014-08-08 17:06 GMT-03:00 Shane Hansen shanemhan...@gmail.com: Are you using apache or Datastax cassandra? The datastax distribution ups the file handle limit to 10. That number's hard to exceed. On Fri, Aug 8, 2014 at 1:35 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: Hi, I am using Cassandra 2.0.9 running on Debian Wheezy, and I am having too many open files exceptions when I try to perform a large number of operations in my 10 node cluster. I saw the documentation http://www.datastax.com/documentation/cassandra/2.0/cassandra/troubleshooting/trblshootTooManyFiles_r.html and I have set everything to the recommended settings, but I keep getting the errors. In the documentation it says: Another, much less likely possibility, is a file descriptor leak in Cassandra. Run lsof -n | grep java to check that the number of file descriptors opened by Java is reasonable and reports the error if the number is greater than a few thousand. I guess it's not the case, or else a lot of people would be complaining about it, but I am not sure what I could do to solve the problem. Any hint about how to solve it? My client is written in python and uses Cassandra Python Driver. Here are the exceptions I am having in the client: [s1log] 2014-08-08 12:16:09,631 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.151, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,632 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.142, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,633 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.143, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,634 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.142, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,634 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.145, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,635 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.144, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,635 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.148, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,732 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.146, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,733 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.77, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,734 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.76, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,734 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.75, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,735 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.142, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,736 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.185, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,942 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.144, scheduling retry in 512.0 seconds: Timed out connecting to 200.200.200.144 [s1log] 2014-08-08 12:16:09,998 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.77, scheduling retry in 512.0 seconds: Timed out connecting to 200.200.200.77 And here is the exception I am having in the server: WARN [Native-Transport-Requests:163] 2014-08-08 14:27:30,499 BatchStatement.java (line 223) Batch of prepared statements for [identification.entity_lookup, identification.entity] is of size 25216, exceeding specified threshold of 5120 by 20096. ERROR [Native-Transport-Requests:150] 2014-08-08 14:27:31,611 ErrorMessage.java (line 222) Unexpected exception during request java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223
Re: too many open files
You may want to look at the the actual filenames. You might have an app leaving them open. Also, remember, sockets use FDs so they are in the list too. On Fri, Aug 8, 2014 at 1:13 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: I am using datastax community, the packaged version for Debian. I am also using last version of opscenter and datastax-agent However, I just listed open files here: sudo lsof -n | grep java | wc -l 1096599 It seems it has exceed. Should I just increase? Or is it possible to be a memory leak? Best regards, Marcelo. 2014-08-08 17:06 GMT-03:00 Shane Hansen shanemhan...@gmail.com: Are you using apache or Datastax cassandra? The datastax distribution ups the file handle limit to 10. That number's hard to exceed. On Fri, Aug 8, 2014 at 1:35 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: Hi, I am using Cassandra 2.0.9 running on Debian Wheezy, and I am having too many open files exceptions when I try to perform a large number of operations in my 10 node cluster. I saw the documentation http://www.datastax.com/documentation/cassandra/2.0/cassandra/troubleshooting/trblshootTooManyFiles_r.html and I have set everything to the recommended settings, but I keep getting the errors. In the documentation it says: Another, much less likely possibility, is a file descriptor leak in Cassandra. Run lsof -n | grep java to check that the number of file descriptors opened by Java is reasonable and reports the error if the number is greater than a few thousand. I guess it's not the case, or else a lot of people would be complaining about it, but I am not sure what I could do to solve the problem. Any hint about how to solve it? My client is written in python and uses Cassandra Python Driver. Here are the exceptions I am having in the client: [s1log] 2014-08-08 12:16:09,631 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.151, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,632 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.142, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,633 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.143, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,634 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.142, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,634 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.145, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,635 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.144, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,635 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.148, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,732 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.146, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,733 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.77, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,734 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.76, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,734 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.75, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,735 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.142, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,736 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.185, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,942 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.144, scheduling retry in 512.0 seconds: Timed out connecting to 200.200.200.144 [s1log] 2014-08-08 12:16:09,998 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.77, scheduling retry in 512.0 seconds: Timed out connecting to 200.200.200.77 And here is the exception I am having in the server: WARN [Native-Transport-Requests:163] 2014-08-08 14:27:30,499 BatchStatement.java (line 223) Batch of prepared statements for [identification.entity_lookup, identification.entity] is of size 25216, exceeding specified threshold of 5120 by 20096. ERROR [Native-Transport-Requests:150] 2014-08-08 14:27:31,611 ErrorMessage.java (line 222) Unexpected exception during
Re: too many open files
I just solved the issue, it was Cassandra process which was opening too many fds, indeed, but the problem was the amount of client connections being opened. It was opening more connection than needed in the client' side. Thanks for the help. []s 2014-08-08 17:17 GMT-03:00 Kevin Burton bur...@spinn3r.com: You may want to look at the the actual filenames. You might have an app leaving them open. Also, remember, sockets use FDs so they are in the list too. On Fri, Aug 8, 2014 at 1:13 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: I am using datastax community, the packaged version for Debian. I am also using last version of opscenter and datastax-agent However, I just listed open files here: sudo lsof -n | grep java | wc -l 1096599 It seems it has exceed. Should I just increase? Or is it possible to be a memory leak? Best regards, Marcelo. 2014-08-08 17:06 GMT-03:00 Shane Hansen shanemhan...@gmail.com: Are you using apache or Datastax cassandra? The datastax distribution ups the file handle limit to 10. That number's hard to exceed. On Fri, Aug 8, 2014 at 1:35 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: Hi, I am using Cassandra 2.0.9 running on Debian Wheezy, and I am having too many open files exceptions when I try to perform a large number of operations in my 10 node cluster. I saw the documentation http://www.datastax.com/documentation/cassandra/2.0/cassandra/troubleshooting/trblshootTooManyFiles_r.html and I have set everything to the recommended settings, but I keep getting the errors. In the documentation it says: Another, much less likely possibility, is a file descriptor leak in Cassandra. Run lsof -n | grep java to check that the number of file descriptors opened by Java is reasonable and reports the error if the number is greater than a few thousand. I guess it's not the case, or else a lot of people would be complaining about it, but I am not sure what I could do to solve the problem. Any hint about how to solve it? My client is written in python and uses Cassandra Python Driver. Here are the exceptions I am having in the client: [s1log] 2014-08-08 12:16:09,631 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.151, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,632 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.142, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,633 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.143, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,634 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.142, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,634 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.145, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,635 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.144, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,635 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.148, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,732 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.146, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,733 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.77, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,734 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.76, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,734 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.75, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,735 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.142, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,736 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.185, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,942 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.144, scheduling retry in 512.0 seconds: Timed out connecting to 200.200.200.144 [s1log] 2014-08-08 12:16:09,998 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.77, scheduling retry in 512.0 seconds: Timed out connecting to 200.200.200.77 And here is the exception I am having in the server: WARN [Native-Transport-Requests:163] 2014-08-08 14:27
Re: too many open files
You may have this problem if your client doesn't reuse the connection but opens new every type. So, run netstat and check the number of established connections. This number should not be big. Thank you, Andrey On Fri, Aug 8, 2014 at 12:35 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: Hi, I am using Cassandra 2.0.9 running on Debian Wheezy, and I am having too many open files exceptions when I try to perform a large number of operations in my 10 node cluster. I saw the documentation http://www.datastax.com/documentation/cassandra/2.0/cassandra/troubleshooting/trblshootTooManyFiles_r.html and I have set everything to the recommended settings, but I keep getting the errors. In the documentation it says: Another, much less likely possibility, is a file descriptor leak in Cassandra. Run lsof -n | grep java to check that the number of file descriptors opened by Java is reasonable and reports the error if the number is greater than a few thousand. I guess it's not the case, or else a lot of people would be complaining about it, but I am not sure what I could do to solve the problem. Any hint about how to solve it? My client is written in python and uses Cassandra Python Driver. Here are the exceptions I am having in the client: [s1log] 2014-08-08 12:16:09,631 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.151, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,632 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.142, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,633 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.143, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,634 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.142, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,634 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.145, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,635 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.144, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,635 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.148, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,732 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.146, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,733 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.77, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,734 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.76, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,734 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.75, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,735 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.142, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,736 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.185, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,942 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.144, scheduling retry in 512.0 seconds: Timed out connecting to 200.200.200.144 [s1log] 2014-08-08 12:16:09,998 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.77, scheduling retry in 512.0 seconds: Timed out connecting to 200.200.200.77 And here is the exception I am having in the server: WARN [Native-Transport-Requests:163] 2014-08-08 14:27:30,499 BatchStatement.java (line 223) Batch of prepared statements for [identification.entity_lookup, identification.entity] is of size 25216, exceeding specified threshold of 5120 by 20096. ERROR [Native-Transport-Requests:150] 2014-08-08 14:27:31,611 ErrorMessage.java (line 222) Unexpected exception during request java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:192) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:375) at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:64) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109
Re: too many open files
Just to chime in, I also ran into this issue when I was migrating to the Datastax client. Instead of reusing the session, I was opening a new session each time. For some reason, even though I was still closing the session on the client side, I was getting the same error. Plus, the only way I could recover was by restarting Cassandra. I did not really see the connections timeout over a period of a few minutes. Andrew On Aug 8, 2014 3:19 PM, Andrey Ilinykh ailin...@gmail.com wrote: You may have this problem if your client doesn't reuse the connection but opens new every type. So, run netstat and check the number of established connections. This number should not be big. Thank you, Andrey On Fri, Aug 8, 2014 at 12:35 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: Hi, I am using Cassandra 2.0.9 running on Debian Wheezy, and I am having too many open files exceptions when I try to perform a large number of operations in my 10 node cluster. I saw the documentation http://www.datastax.com/documentation/cassandra/2.0/cassandra/troubleshooting/trblshootTooManyFiles_r.html and I have set everything to the recommended settings, but I keep getting the errors. In the documentation it says: Another, much less likely possibility, is a file descriptor leak in Cassandra. Run lsof -n | grep java to check that the number of file descriptors opened by Java is reasonable and reports the error if the number is greater than a few thousand. I guess it's not the case, or else a lot of people would be complaining about it, but I am not sure what I could do to solve the problem. Any hint about how to solve it? My client is written in python and uses Cassandra Python Driver. Here are the exceptions I am having in the client: [s1log] 2014-08-08 12:16:09,631 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.151, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,632 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.142, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,633 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.143, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,634 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.142, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,634 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.145, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,635 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.144, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,635 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.148, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,732 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.146, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,733 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.77, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,734 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.76, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,734 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.75, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,735 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.142, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,736 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.185, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,942 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.144, scheduling retry in 512.0 seconds: Timed out connecting to 200.200.200.144 [s1log] 2014-08-08 12:16:09,998 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.77, scheduling retry in 512.0 seconds: Timed out connecting to 200.200.200.77 And here is the exception I am having in the server: WARN [Native-Transport-Requests:163] 2014-08-08 14:27:30,499 BatchStatement.java (line 223) Batch of prepared statements for [identification.entity_lookup, identification.entity] is of size 25216, exceeding specified threshold of 5120 by 20096. ERROR [Native-Transport-Requests:150] 2014-08-08 14:27:31,611 ErrorMessage.java (line 222) Unexpected exception during request java.io.IOException: Connection reset by peer
Re: too many open files
On Fri, Aug 8, 2014 at 5:52 PM, Redmumba redmu...@gmail.com wrote: Just to chime in, I also ran into this issue when I was migrating to the Datastax client. Instead of reusing the session, I was opening a new session each time. For some reason, even though I was still closing the session on the client side, I was getting the same error. Which driver? If you can still reproduce this, would you mind opening a ticket? (https://datastax-oss.atlassian.net/secure/BrowseProjects.jspa#all) -- Tyler Hobbs DataStax http://datastax.com/
Re: too many open files
Yes, definitely look how many open files are actual file handles versus networks sockets. We found a file handle leak in 2.0 but it was patched in 2.0.3 or .5 I think. A million open files is way too high. On Fri, Aug 8, 2014 at 5:19 PM, Andrey Ilinykh ailin...@gmail.com wrote: You may have this problem if your client doesn't reuse the connection but opens new every type. So, run netstat and check the number of established connections. This number should not be big. Thank you, Andrey On Fri, Aug 8, 2014 at 12:35 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: Hi, I am using Cassandra 2.0.9 running on Debian Wheezy, and I am having too many open files exceptions when I try to perform a large number of operations in my 10 node cluster. I saw the documentation http://www.datastax.com/documentation/cassandra/2.0/cassandra/troubleshooting/trblshootTooManyFiles_r.html and I have set everything to the recommended settings, but I keep getting the errors. In the documentation it says: Another, much less likely possibility, is a file descriptor leak in Cassandra. Run lsof -n | grep java to check that the number of file descriptors opened by Java is reasonable and reports the error if the number is greater than a few thousand. I guess it's not the case, or else a lot of people would be complaining about it, but I am not sure what I could do to solve the problem. Any hint about how to solve it? My client is written in python and uses Cassandra Python Driver. Here are the exceptions I am having in the client: [s1log] 2014-08-08 12:16:09,631 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.151, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,632 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.142, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,633 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.143, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,634 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.142, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,634 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.145, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,635 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.144, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,635 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.148, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,732 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.146, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,733 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.77, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,734 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.76, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,734 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.75, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,735 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.142, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,736 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.185, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,942 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.144, scheduling retry in 512.0 seconds: Timed out connecting to 200.200.200.144 [s1log] 2014-08-08 12:16:09,998 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.77, scheduling retry in 512.0 seconds: Timed out connecting to 200.200.200.77 And here is the exception I am having in the server: WARN [Native-Transport-Requests:163] 2014-08-08 14:27:30,499 BatchStatement.java (line 223) Batch of prepared statements for [identification.entity_lookup, identification.entity] is of size 25216, exceeding specified threshold of 5120 by 20096. ERROR [Native-Transport-Requests:150] 2014-08-08 14:27:31,611 ErrorMessage.java (line 222) Unexpected exception during request java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223
Re: too many open files
For cassandra driver,session is just like database connection pool,it maybe contains many tcp connections,if you create a new session every time,more and more tcp connections will be created,till surpass the max file description limit of os. You should create one session,use it repeatedly ,session can manage connections automatically,create new connection or close old connection for your requests. 在 2014年8月9日,6:52,Redmumba redmu...@gmail.com 写道: Just to chime in, I also ran into this issue when I was migrating to the Datastax client. Instead of reusing the session, I was opening a new session each time. For some reason, even though I was still closing the session on the client side, I was getting the same error. Plus, the only way I could recover was by restarting Cassandra. I did not really see the connections timeout over a period of a few minutes. Andrew On Aug 8, 2014 3:19 PM, Andrey Ilinykh ailin...@gmail.com wrote: You may have this problem if your client doesn't reuse the connection but opens new every type. So, run netstat and check the number of established connections. This number should not be big. Thank you, Andrey On Fri, Aug 8, 2014 at 12:35 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: Hi, I am using Cassandra 2.0.9 running on Debian Wheezy, and I am having too many open files exceptions when I try to perform a large number of operations in my 10 node cluster. I saw the documentation http://www.datastax.com/documentation/cassandra/2.0/cassandra/troubleshooting/trblshootTooManyFiles_r.html and I have set everything to the recommended settings, but I keep getting the errors. In the documentation it says: Another, much less likely possibility, is a file descriptor leak in Cassandra. Run lsof -n | grep java to check that the number of file descriptors opened by Java is reasonable and reports the error if the number is greater than a few thousand. I guess it's not the case, or else a lot of people would be complaining about it, but I am not sure what I could do to solve the problem. Any hint about how to solve it? My client is written in python and uses Cassandra Python Driver. Here are the exceptions I am having in the client: [s1log] 2014-08-08 12:16:09,631 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.151, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,632 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.142, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,633 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.143, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,634 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.142, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,634 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.145, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,635 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.144, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,635 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.148, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,732 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.146, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,733 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.77, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,734 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.76, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,734 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.75, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,735 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.142, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,736 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.185, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,942 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.144, scheduling retry in 512.0 seconds: Timed out connecting to 200.200.200.144 [s1log] 2014-08-08 12:16:09,998 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.77, scheduling retry in 512.0 seconds: Timed out connecting
Re: too many open files
Indeed, that was my mistake, that was exactly what we were doing in the code. []s 2014-08-09 0:56 GMT-03:00 Brian Zhang yikebo...@gmail.com: For cassandra driver,session is just like database connection pool,it maybe contains many tcp connections,if you create a new session every time,more and more tcp connections will be created,till surpass the max file description limit of os. You should create one session,use it repeatedly ,session can manage connections automatically,create new connection or close old connection for your requests. 在 2014年8月9日,6:52,Redmumba redmu...@gmail.com 写道: Just to chime in, I also ran into this issue when I was migrating to the Datastax client. Instead of reusing the session, I was opening a new session each time. For some reason, even though I was still closing the session on the client side, I was getting the same error. Plus, the only way I could recover was by restarting Cassandra. I did not really see the connections timeout over a period of a few minutes. Andrew On Aug 8, 2014 3:19 PM, Andrey Ilinykh ailin...@gmail.com wrote: You may have this problem if your client doesn't reuse the connection but opens new every type. So, run netstat and check the number of established connections. This number should not be big. Thank you, Andrey On Fri, Aug 8, 2014 at 12:35 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: Hi, I am using Cassandra 2.0.9 running on Debian Wheezy, and I am having too many open files exceptions when I try to perform a large number of operations in my 10 node cluster. I saw the documentation http://www.datastax.com/documentation/cassandra/2.0/cassandra/troubleshooting/trblshootTooManyFiles_r.html and I have set everything to the recommended settings, but I keep getting the errors. In the documentation it says: Another, much less likely possibility, is a file descriptor leak in Cassandra. Run lsof -n | grep java to check that the number of file descriptors opened by Java is reasonable and reports the error if the number is greater than a few thousand. I guess it's not the case, or else a lot of people would be complaining about it, but I am not sure what I could do to solve the problem. Any hint about how to solve it? My client is written in python and uses Cassandra Python Driver. Here are the exceptions I am having in the client: [s1log] 2014-08-08 12:16:09,631 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.151, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,632 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.142, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,633 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.143, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,634 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.142, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,634 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.145, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,635 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.144, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,635 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.148, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,732 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.146, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,733 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.77, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,734 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.76, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,734 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.75, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,735 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.142, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,736 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.185, scheduling retry in 600.0 seconds: [Errno 24] Too many open files [s1log] 2014-08-08 12:16:09,942 - cassandra.pool - WARNING - Error attempting to reconnect to 200.200.200.144, scheduling retry in 512.0 seconds: Timed out connecting to 200.200.200.144 [s1log] 2014-08-08 12:16:09,998 - cassandra.pool - WARNING - Error attempting
Re: Too Many Open Files (sockets) - VNodes - Map/Reduce Job
(this is probably a better question for the user list - cc/reply-to set) Allow more files to be open :) http://www.datastax.com/documentation/cassandra/1.2/cassandra/install/installRecommendSettings.html -- Kind regards, Michael On 06/04/2014 12:15 PM, Florian Dambrine wrote: Hi every body, We are running ElasticMapReduce Jobs from Amazon on a 25 nodes Cassandra cluster (with VNodes). Since we have increased the size of the cluster we are facing a too many open files (due to sockets) exception when creating the splits. Does anyone has an idea? Thanks, Here is the stacktrace: 14/06/04 03:23:24 INFO mapred.JobClient: Default number of map tasks: null 14/06/04 03:23:24 INFO mapred.JobClient: Setting default number of map tasks based on cluster size to : 80 14/06/04 03:23:24 INFO mapred.JobClient: Default number of reduce tasks: 26 14/06/04 03:23:25 INFO security.ShellBasedUnixGroupsMapping: add hadoop to shell userGroupsCache 14/06/04 03:23:25 INFO mapred.JobClient: Setting group to hadoop 14/06/04 03:23:41 ERROR transport.TSocket: Could not configure socket. java.net.SocketException: Too many open files at java.net.Socket.createImpl(Socket.java:447) at java.net.Socket.getImpl(Socket.java:510) at java.net.Socket.setSoLinger(Socket.java:984) at org.apache.thrift.transport.TSocket.initSocket(TSocket.java:118) at org.apache.thrift.transport.TSocket.init(TSocket.java:109) at org.apache.thrift.transport.TSocket.init(TSocket.java:94) at org.apache.cassandra.thrift.TFramedTransportFactory.openTransport(TFramedTransportFactory.java:39) at org.apache.cassandra.hadoop.ConfigHelper.createConnection(ConfigHelper.java:558) at org.apache.cassandra.hadoop.AbstractColumnFamilyInputFormat.getSubSplits(AbstractColumnFamilyInputFormat.java:286) at org.apache.cassandra.hadoop.AbstractColumnFamilyInputFormat.access$200(AbstractColumnFamilyInputFormat.java:61) at org.apache.cassandra.hadoop.AbstractColumnFamilyInputFormat$SplitCallable.call(AbstractColumnFamilyInputFormat.java:236) at org.apache.cassandra.hadoop.AbstractColumnFamilyInputFormat$SplitCallable.call(AbstractColumnFamilyInputFormat.java:221) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744)
Re: Getting into Too many open files issues
There was a bug introduced in 2.0.0-beta1 related to TTL, a patch just came available in: https://issues.apache.org/jira/browse/CASSANDRA-6275 On Thu, Nov 7, 2013 at 5:15 AM, Murthy Chelankuri kmurt...@gmail.comwrote: I have experimenting cassandra latest version for storing the huge the in our application. Write are doing good. but when comes to reads i have obsereved that cassandra is getting into too many open files issues. When i check the logs its not able to open the cassandra data files any more before of the file descriptors limits. Can some one suggest me what i am going wrong what could be issues which causing the read operating leads to Too many open files issue.
Re: Getting into Too many open files issues
Some reason with in less than an hour cassandra node is opening 32768 files and cassandra is not responding after that. Are you using Levelled Compaction ? Is so what value did you set for min_sstable_size ? The default has changed from 5 to 160. Increasing the file handles is the right thing to do but 32K files is a lot. Cheers - Aaron Morton New Zealand @aaronmorton Co-Founder Principal Consultant Apache Cassandra Consulting http://www.thelastpickle.com On 8/11/2013, at 8:09 am, Arindam Barua aba...@247-inc.com wrote: I see 100 000 recommended in the Datastax documentation for thenofile limit since Cassandra 1.2 : http://www.datastax.com/documentation/cassandra/2.0/webhelp/cassandra/install/installRecommendSettings.html -Arindam From: Pieter Callewaert [mailto:pieter.callewa...@be-mobile.be] Sent: Thursday, November 07, 2013 4:22 AM To: user@cassandra.apache.org Subject: RE: Getting into Too many open files issues Hi Murthy, 32768 is a bit low (I know datastax docs recommend this). But our production env is now running on 1kk, or you can even put it on unlimited. Pieter From: Murthy Chelankuri [mailto:kmurt...@gmail.com] Sent: donderdag 7 november 2013 12:46 To: user@cassandra.apache.org Subject: Re: Getting into Too many open files issues Thanks Pieter for giving quick reply. I have downloaded the tar ball. And have changed the limits.conf as per the documentation like below. * soft nofile 32768 * hard nofile 32768 root soft nofile 32768 root hard nofile 32768 * soft memlock unlimited * hard memlock unlimited root soft memlock unlimited root hard memlock unlimited * soft as unlimited * hard as unlimited root soft as unlimited root hard as unlimited root soft/hard nproc 32000 Some reason with in less than an hour cassandra node is opening 32768 files and cassandra is not responding after that. It is still not clear why cassadra is opening that many files and not closing properly ( does the laest cassandra 2.0.1 version have some bugs ). what i have been experimenting is 300 writes per sec and 500 reads per sec. And i have using 2 node cluster with 8 core cpu and 32GB RAM ( Virtuval Machines) Do we need to increase the nofile limts to more than 32768 ? On Thu, Nov 7, 2013 at 4:55 PM, Pieter Callewaert pieter.callewa...@be-mobile.be wrote: Hi Murthy, Did you do a package install (.deb?) or you downloaded the tar? If the latest, you have to adjust the limits.conf file (/etc/security/limits.conf) to raise the nofile (number of files open) for the cassandra user. If you are using the .deb package, the limit is already raised to 100 000 files. (can be found in /etc/init.d/cassandra, FD_LIMIT). However, with the 2.0.x I had to raise it to 1 000 000 because 100 000 was too low. Kind regards, Pieter Callewaert From: Murthy Chelankuri [mailto:kmurt...@gmail.com] Sent: donderdag 7 november 2013 12:15 To: user@cassandra.apache.org Subject: Getting into Too many open files issues I have experimenting cassandra latest version for storing the huge the in our application. Write are doing good. but when comes to reads i have obsereved that cassandra is getting into too many open files issues. When i check the logs its not able to open the cassandra data files any more before of the file descriptors limits. Can some one suggest me what i am going wrong what could be issues which causing the read operating leads to Too many open files issue.
Getting into Too many open files issues
I have experimenting cassandra latest version for storing the huge the in our application. Write are doing good. but when comes to reads i have obsereved that cassandra is getting into too many open files issues. When i check the logs its not able to open the cassandra data files any more before of the file descriptors limits. Can some one suggest me what i am going wrong what could be issues which causing the read operating leads to Too many open files issue.
RE: Getting into Too many open files issues
Hi Murthy, Did you do a package install (.deb?) or you downloaded the tar? If the latest, you have to adjust the limits.conf file (/etc/security/limits.conf) to raise the nofile (number of files open) for the cassandra user. If you are using the .deb package, the limit is already raised to 100 000 files. (can be found in /etc/init.d/cassandra, FD_LIMIT). However, with the 2.0.x I had to raise it to 1 000 000 because 100 000 was too low. Kind regards, Pieter Callewaert From: Murthy Chelankuri [mailto:kmurt...@gmail.com] Sent: donderdag 7 november 2013 12:15 To: user@cassandra.apache.org Subject: Getting into Too many open files issues I have experimenting cassandra latest version for storing the huge the in our application. Write are doing good. but when comes to reads i have obsereved that cassandra is getting into too many open files issues. When i check the logs its not able to open the cassandra data files any more before of the file descriptors limits. Can some one suggest me what i am going wrong what could be issues which causing the read operating leads to Too many open files issue.
Re: Getting into Too many open files issues
Thanks Pieter for giving quick reply. I have downloaded the tar ball. And have changed the limits.conf as per the documentation like below. * soft nofile 32768 * hard nofile 32768 root soft nofile 32768 root hard nofile 32768 * soft memlock unlimited * hard memlock unlimited root soft memlock unlimited root hard memlock unlimited * soft as unlimited * hard as unlimited root soft as unlimited root hard as unlimited root soft/hard nproc 32000 Some reason with in less than an hour cassandra node is opening 32768 files and cassandra is not responding after that. It is still not clear why cassadra is opening that many files and not closing properly ( does the laest cassandra 2.0.1 version have some bugs ). what i have been experimenting is 300 writes per sec and 500 reads per sec. And i have using 2 node cluster with 8 core cpu and 32GB RAM ( Virtuval Machines) Do we need to increase the nofile limts to more than 32768 ? On Thu, Nov 7, 2013 at 4:55 PM, Pieter Callewaert pieter.callewa...@be-mobile.be wrote: Hi Murthy, Did you do a package install (.deb?) or you downloaded the tar? If the latest, you have to adjust the limits.conf file (/etc/security/limits.conf) to raise the nofile (number of files open) for the cassandra user. If you are using the .deb package, the limit is already raised to 100 000 files. (can be found in /etc/init.d/cassandra, FD_LIMIT). However, with the 2.0.x I had to raise it to 1 000 000 because 100 000 was too low. Kind regards, Pieter Callewaert *From:* Murthy Chelankuri [mailto:kmurt...@gmail.com] *Sent:* donderdag 7 november 2013 12:15 *To:* user@cassandra.apache.org *Subject:* Getting into Too many open files issues I have experimenting cassandra latest version for storing the huge the in our application. Write are doing good. but when comes to reads i have obsereved that cassandra is getting into too many open files issues. When i check the logs its not able to open the cassandra data files any more before of the file descriptors limits. Can some one suggest me what i am going wrong what could be issues which causing the read operating leads to Too many open files issue.
RE: Getting into Too many open files issues
Hi Murthy, 32768 is a bit low (I know datastax docs recommend this). But our production env is now running on 1kk, or you can even put it on unlimited. Pieter From: Murthy Chelankuri [mailto:kmurt...@gmail.com] Sent: donderdag 7 november 2013 12:46 To: user@cassandra.apache.org Subject: Re: Getting into Too many open files issues Thanks Pieter for giving quick reply. I have downloaded the tar ball. And have changed the limits.conf as per the documentation like below. * soft nofile 32768 * hard nofile 32768 root soft nofile 32768 root hard nofile 32768 * soft memlock unlimited * hard memlock unlimited root soft memlock unlimited root hard memlock unlimited * soft as unlimited * hard as unlimited root soft as unlimited root hard as unlimited root soft/hard nproc 32000 Some reason with in less than an hour cassandra node is opening 32768 files and cassandra is not responding after that. It is still not clear why cassadra is opening that many files and not closing properly ( does the laest cassandra 2.0.1 version have some bugs ). what i have been experimenting is 300 writes per sec and 500 reads per sec. And i have using 2 node cluster with 8 core cpu and 32GB RAM ( Virtuval Machines) Do we need to increase the nofile limts to more than 32768 ? On Thu, Nov 7, 2013 at 4:55 PM, Pieter Callewaert pieter.callewa...@be-mobile.bemailto:pieter.callewa...@be-mobile.be wrote: Hi Murthy, Did you do a package install (.deb?) or you downloaded the tar? If the latest, you have to adjust the limits.conf file (/etc/security/limits.conf) to raise the nofile (number of files open) for the cassandra user. If you are using the .deb package, the limit is already raised to 100 000 files. (can be found in /etc/init.d/cassandra, FD_LIMIT). However, with the 2.0.x I had to raise it to 1 000 000 because 100 000 was too low. Kind regards, Pieter Callewaert From: Murthy Chelankuri [mailto:kmurt...@gmail.commailto:kmurt...@gmail.com] Sent: donderdag 7 november 2013 12:15 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Getting into Too many open files issues I have experimenting cassandra latest version for storing the huge the in our application. Write are doing good. but when comes to reads i have obsereved that cassandra is getting into too many open files issues. When i check the logs its not able to open the cassandra data files any more before of the file descriptors limits. Can some one suggest me what i am going wrong what could be issues which causing the read operating leads to Too many open files issue.
RE: Getting into Too many open files issues
I see 100 000 recommended in the Datastax documentation for thenofile limit since Cassandra 1.2 : http://www.datastax.com/documentation/cassandra/2.0/webhelp/cassandra/install/installRecommendSettings.html -Arindam From: Pieter Callewaert [mailto:pieter.callewa...@be-mobile.be] Sent: Thursday, November 07, 2013 4:22 AM To: user@cassandra.apache.org Subject: RE: Getting into Too many open files issues Hi Murthy, 32768 is a bit low (I know datastax docs recommend this). But our production env is now running on 1kk, or you can even put it on unlimited. Pieter From: Murthy Chelankuri [mailto:kmurt...@gmail.com] Sent: donderdag 7 november 2013 12:46 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Getting into Too many open files issues Thanks Pieter for giving quick reply. I have downloaded the tar ball. And have changed the limits.conf as per the documentation like below. * soft nofile 32768 * hard nofile 32768 root soft nofile 32768 root hard nofile 32768 * soft memlock unlimited * hard memlock unlimited root soft memlock unlimited root hard memlock unlimited * soft as unlimited * hard as unlimited root soft as unlimited root hard as unlimited root soft/hard nproc 32000 Some reason with in less than an hour cassandra node is opening 32768 files and cassandra is not responding after that. It is still not clear why cassadra is opening that many files and not closing properly ( does the laest cassandra 2.0.1 version have some bugs ). what i have been experimenting is 300 writes per sec and 500 reads per sec. And i have using 2 node cluster with 8 core cpu and 32GB RAM ( Virtuval Machines) Do we need to increase the nofile limts to more than 32768 ? On Thu, Nov 7, 2013 at 4:55 PM, Pieter Callewaert pieter.callewa...@be-mobile.bemailto:pieter.callewa...@be-mobile.be wrote: Hi Murthy, Did you do a package install (.deb?) or you downloaded the tar? If the latest, you have to adjust the limits.conf file (/etc/security/limits.conf) to raise the nofile (number of files open) for the cassandra user. If you are using the .deb package, the limit is already raised to 100 000 files. (can be found in /etc/init.d/cassandra, FD_LIMIT). However, with the 2.0.x I had to raise it to 1 000 000 because 100 000 was too low. Kind regards, Pieter Callewaert From: Murthy Chelankuri [mailto:kmurt...@gmail.commailto:kmurt...@gmail.com] Sent: donderdag 7 november 2013 12:15 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Getting into Too many open files issues I have experimenting cassandra latest version for storing the huge the in our application. Write are doing good. but when comes to reads i have obsereved that cassandra is getting into too many open files issues. When i check the logs its not able to open the cassandra data files any more before of the file descriptors limits. Can some one suggest me what i am going wrong what could be issues which causing the read operating leads to Too many open files issue.
Re: Too many open files with Cassandra 1.2.11
What’s in /etc/security/limits.conf ? and just for fun what does lsof -n | grep java | wc -l say ? Cheers - Aaron Morton New Zealand @aaronmorton Co-Founder Principal Consultant Apache Cassandra Consulting http://www.thelastpickle.com On 30/10/2013, at 12:21 am, Oleg Dulin oleg.du...@gmail.com wrote: Got this error: WARN [Thread-8] 2013-10-29 02:58:24,565 CustomTThreadPoolServer.java (line 122) Transport error occurred during acceptance of message. 2 org.apache.thrift.transport.TTransportException: java.net.SocketException: Too many open files 3 at org.apache.cassandra.thrift.TCustomServerSocket.acceptImpl(TCustomServerSocket.java:109) 4 at org.apache.cassandra.thrift.TCustomServerSocket.acceptImpl(TCustomServerSocket.java:36) 5 at org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:31) 6 at org.apache.cassandra.thrift.CustomTThreadPoolServer.serve(CustomTThreadPoolServer.java:110) 7 at org.apache.cassandra.thrift.ThriftServer$ThriftServerThread.run(ThriftServer.java:111) I haven't seen this since 1.0 days. 1.1.11 had it all fixed I thought. ulimit outputs unlimited What could cause this ? Any help is greatly apprecaited. -- Regards, Oleg Dulin http://www.olegdulin.com
Too many open files with Cassandra 1.2.11
Got this error: WARN [Thread-8] 2013-10-29 02:58:24,565 CustomTThreadPoolServer.java (line 122) Transport error occurred during acceptance of message. 2 org.apache.thrift.transport.TTransportException: java.net.SocketException: Too many open files 3 at org.apache.cassandra.thrift.TCustomServerSocket.acceptImpl(TCustomServerSocket.java:109) 4 at org.apache.cassandra.thrift.TCustomServerSocket.acceptImpl(TCustomServerSocket.java:36) 5 at org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:31) 6 at org.apache.cassandra.thrift.CustomTThreadPoolServer.serve(CustomTThreadPoolServer.java:110) 7 at org.apache.cassandra.thrift.ThriftServer$ThriftServerThread.run(ThriftServer.java:111) I haven't seen this since 1.0 days. 1.1.11 had it all fixed I thought. ulimit outputs unlimited What could cause this ? Any help is greatly apprecaited. -- Regards, Oleg Dulin http://www.olegdulin.com
Too many open files (Cassandra 2.0.1)
Hi, I've noticed some nodes in our cluster are dying after some period of time. WARN [New I/O server boss #17] 2013-10-29 12:22:20,725 Slf4JLogger.java (line 76) Failed to accept a connection. java.io.IOException: Too many open files at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:241) at org.jboss.netty.channel.socket.nio.NioServerBoss.process(NioServerBoss.java:100) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312) at org.jboss.netty.channel.socket.nio.NioServerBoss.run(NioServerBoss.java:42) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) And other exceptions related to the same cause. Now, as we use the Cassandra package, the nofile limit is raised to 10. To double check if this correct: root@de-cass09 ~ # cat /proc/18332/limits Limit Soft Limit Hard Limit Units ... Max open files10 10 files ... Now I check how many files are open: root@de-cass09 ~ # lsof -n -p 18332 | wc -l 100038 This seems an awful a lot for size tiered compaction... ? Now I noticed when I checked the list, a (deleted) file passed a lot ... java18332 cassandra 4704r REG8,1 10911921661 2147483839 /data1/mapdata040/hos/mapdata040-hos-jb-7648-Data.db (deleted) java18332 cassandra 4705r REG8,1 10911921661 2147483839 /data1/mapdata040/hos/mapdata040-hos-jb-7648-Data.db (deleted) ... Actually, if I count specific for this file: root@de-cass09 ~ # lsof -n -p 18332 | grep mapdata040-hos-jb-7648-Data.db | wc -l 52707 Other nodes are around a total of 350 files open... Any idea why this nofiles is so high ? The first exceptions I see is this: WARN [New I/O worker #8] 2013-10-29 12:09:34,440 Slf4JLogger.java (line 76) Unexpected exception in the selector loop. java.lang.NullPointerException at sun.nio.ch.EPollArrayWrapper.setUpdateEvents(EPollArrayWrapper.java:178) at sun.nio.ch.EPollArrayWrapper.add(EPollArrayWrapper.java:227) at sun.nio.ch.EPollSelectorImpl.implRegister(EPollSelectorImpl.java:164) at sun.nio.ch.SelectorImpl.register(SelectorImpl.java:133) at java.nio.channels.spi.AbstractSelectableChannel.register(AbstractSelectableChannel.java:209) at org.jboss.netty.channel.socket.nio.NioWorker$RegisterTask.run(NioWorker.java:151) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:366) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:290) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90) at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) Several minutes later I get Too many open files. Specs: 12-node cluster with Ubuntu 12.04 LTS, Cassandra 2.0.1 (datastax packages), using JBOD of 2 disks. JNA enabled. Any suggestions? Kind regards, Pieter Callewaert [Description: cid:image003.png@01CD9CE5.CE5A2330] Pieter Callewaert Web IT engineer Web: www.be-mobile.behttp://www.be-mobile.be/ Email: pieter.callewa...@be-mobile.bemailto:pieter.callewa...@be-mobile.be Tel: + 32 9 330 51 80 inline: image001.png
Re: too many open files
Also, looking through the log, it appears a lot of the files end with ic- which I assume is associated with a secondary index I have on the table. Are secondary indexes really expensive from a file descriptor standpoint? That particular table uses the default compaction scheme... On Jul 15, 2013, at 12:00 AM, Paul Ingalls paulinga...@gmail.com wrote: I have one table that is using leveled. It was set to 10MB, I will try changing it to 256MB. Is there a good way to merge the existing sstables? On Jul 14, 2013, at 5:32 PM, Jonathan Haddad j...@jonhaddad.com wrote: Are you using leveled compaction? If so, what do you have the file size set at? If you're using the defaults, you'll have a ton of really small files. I believe Albert Tobey recommended using 256MB for the table sstable_size_in_mb to avoid this problem. On Sun, Jul 14, 2013 at 5:10 PM, Paul Ingalls paulinga...@gmail.com wrote: I'm running into a problem where instances of my cluster are hitting over 450K open files. Is this normal for a 4 node 1.2.6 cluster with replication factor of 3 and about 50GB of data on each node? I can push the file descriptor limit up, but I plan on having a much larger load so I'm wondering if I should be looking at something else…. Let me know if you need more info… Paul -- Jon Haddad http://www.rustyrazorblade.com skype: rustyrazorblade
Re: too many open files
It doesn't tell you anything if file ends it with ic-###, except pointing out the SSTable version it uses (ic in this case). Files related to secondary index contain something like this in the filename: KS-CF.IDX-NAME, while in regular CFs do not contain any dots except the one just before file extension. M. W dniu 15.07.2013 09:38, Paul Ingalls pisze: Also, looking through the log, it appears a lot of the files end with ic- which I assume is associated with a secondary index I have on the table. Are secondary indexes really expensive from a file descriptor standpoint? That particular table uses the default compaction scheme... On Jul 15, 2013, at 12:00 AM, Paul Ingalls paulinga...@gmail.com wrote: I have one table that is using leveled. It was set to 10MB, I will try changing it to 256MB. Is there a good way to merge the existing sstables? On Jul 14, 2013, at 5:32 PM, Jonathan Haddad j...@jonhaddad.com wrote: Are you using leveled compaction? If so, what do you have the file size set at? If you're using the defaults, you'll have a ton of really small files. I believe Albert Tobey recommended using 256MB for the table sstable_size_in_mb to avoid this problem. On Sun, Jul 14, 2013 at 5:10 PM, Paul Ingalls paulinga...@gmail.com wrote: I'm running into a problem where instances of my cluster are hitting over 450K open files. Is this normal for a 4 node 1.2.6 cluster with replication factor of 3 and about 50GB of data on each node? I can push the file descriptor limit up, but I plan on having a much larger load so I'm wondering if I should be looking at something else…. Let me know if you need more info… Paul -- Jon Haddad http://www.rustyrazorblade.com skype: rustyrazorblade
Re: too many open files
Odd that this discussion happens now as I'm also getting this error. I get a burst of error messages and then the system continues...with no apparent ill effect. I can't tell what the system was doing at the timehere is the stack. BTW Opscenter says I only have 4 or 5 SSTables in each of my 6 CFs. ERROR [ReadStage:62384] 2013-07-14 18:04:26,062 AbstractCassandraDaemon.java (line 135) Exception in thread Thread[ReadStage:62384,5,main] java.io.IOError: java.io.FileNotFoundException: /tmp_vol/cassandra/data/dev_a/portfoliodao/dev_a-portfoliodao-hf-166-Data.db (Too many open files) at org.apache.cassandra.io.util.CompressedSegmentedFile.getSegment(CompressedSegmentedFile.java:69) at org.apache.cassandra.io.sstable.SSTableReader.getFileDataInput(SSTableReader.java:898) at org.apache.cassandra.db.columniterator.SSTableNamesIterator.init(SSTableNamesIterator.java:63) at org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:61) at org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:79) at org.apache.cassandra.db.CollationController.collectTimeOrderedData(CollationController.java:124) at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:64) at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1345) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1207) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1142) at org.apache.cassandra.db.Table.getRow(Table.java:378) at org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:58) at org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:51) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.FileNotFoundException: /tmp_vol/cassandra/data/dev_a/portfoliodao/dev_a-portfoliodao-hf-166-Data.db (Too many open files) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.init(RandomAccessFile.java:216) at org.apache.cassandra.io.util.RandomAccessReader.init(RandomAccessReader.java:67) at org.apache.cassandra.io.compress.CompressedRandomAccessReader.init(CompressedRandomAccessReader.java:64) at org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:46) at org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:41) at org.apache.cassandra.io.util.CompressedSegmentedFile.getSegment(CompressedSegmentedFile.java:63) ... 16 more On Mon, Jul 15, 2013 at 7:23 AM, Michał Michalski mich...@opera.com wrote: It doesn't tell you anything if file ends it with ic-###, except pointing out the SSTable version it uses (ic in this case). Files related to secondary index contain something like this in the filename: KS-CF.IDX-NAME, while in regular CFs do not contain any dots except the one just before file extension. M. W dniu 15.07.2013 09:38, Paul Ingalls pisze: Also, looking through the log, it appears a lot of the files end with ic- which I assume is associated with a secondary index I have on the table. Are secondary indexes really expensive from a file descriptor standpoint? That particular table uses the default compaction scheme... On Jul 15, 2013, at 12:00 AM, Paul Ingalls paulinga...@gmail.com wrote: I have one table that is using leveled. It was set to 10MB, I will try changing it to 256MB. Is there a good way to merge the existing sstables? On Jul 14, 2013, at 5:32 PM, Jonathan Haddad j...@jonhaddad.com wrote: Are you using leveled compaction? If so, what do you have the file size set at? If you're using the defaults, you'll have a ton of really small files. I believe Albert Tobey recommended using 256MB for the table sstable_size_in_mb to avoid this problem. On Sun, Jul 14, 2013 at 5:10 PM, Paul Ingalls paulinga...@gmail.com wrote: I'm running into a problem where instances of my cluster are hitting over 450K open files. Is this normal for a 4 node 1.2.6 cluster with replication factor of 3 and about 50GB of data on each node? I can push the file descriptor limit up, but I plan on having a much larger load so I'm wondering if I should be looking at something else…. Let me know if you need more info… Paul -- Jon Haddad http://www.rustyrazorblade.com skype: rustyrazorblade
too many open files
I'm running into a problem where instances of my cluster are hitting over 450K open files. Is this normal for a 4 node 1.2.6 cluster with replication factor of 3 and about 50GB of data on each node? I can push the file descriptor limit up, but I plan on having a much larger load so I'm wondering if I should be looking at something else…. Let me know if you need more info… Paul
Re: too many open files
Are you using leveled compaction? If so, what do you have the file size set at? If you're using the defaults, you'll have a ton of really small files. I believe Albert Tobey recommended using 256MB for the table sstable_size_in_mb to avoid this problem. On Sun, Jul 14, 2013 at 5:10 PM, Paul Ingalls paulinga...@gmail.com wrote: I'm running into a problem where instances of my cluster are hitting over 450K open files. Is this normal for a 4 node 1.2.6 cluster with replication factor of 3 and about 50GB of data on each node? I can push the file descriptor limit up, but I plan on having a much larger load so I'm wondering if I should be looking at something else…. Let me know if you need more info… Paul -- Jon Haddad http://www.rustyrazorblade.com skype: rustyrazorblade
Too many open files and stopped compaction with many pending compaction tasks
On a test with 3 cassandra servers version 1.2.5 with replication factor 1 and leveled compaction, I did a store last night and I did not see any problem with Cassandra. On all 3 machine the compaction is stopped already several hours. However , one machine reports 650 pending compaction tasks (via jmx). compaction_throughput_mb_per_sec is 0. Concurrent_compactors is 3. multithreaded_compaction = false. No other load on these machines. And when I start querying (using thrift), I get a 'too many open files' error on the machine with pending compaction tasks. Limits.conf setting for nofile is 65536 Using 'lsof' and 'wc -l' I get a count of 59577 files for Cassandra. Total count of keyspace files on disk : 20464. The 3 machines have an equal (+/-) data load of about 60 GB. I see that 2 machines have no unleveled or just 1 sstables on any keyspace, but on the machine with troubles there is one keyspace having 670 unleveled sstables. Level sstable histo [670,28,106,14] thus 818 sstables. An 'ls' on that directory counts for 5729 files, which corresponds to the 818 sstable (7 files per sstables). After restart of that machine I get 4037 open files for Cassandra. And also compaction has restarted. Once finisched I get SSTableCountPerLEvel = [0,10, 109, 644]. Also, compaction reports speeds of 2.5 MB per sec. Seems slow too me. CPU less than 10%, Disk 15% with peeks to 45% (15000 rpm scsi). 14 GB free memory. So I am puzzled about the number of open files and number of unleveled sstables, and a not so fast compaction. Anything than can be done? Or to be done so that the next time I can get more useful information? Regards, Ignace Example output of lsof is : java10968 root 483r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db java10968 root 484u REG8,1 33554432 29229231 /home/cassandra/deployed/data/cdi.cassandra.cdi/dbcommitlog/CommitLog-2-1372260568123.log java10968 root 485r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db java10968 root 486r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db java10968 root 487r REG 8,17 39967253 14158943 /media/datadrive1/dbdatafile/Ks100K/ReverseStringFunction/Ks100K-ReverseStringFunction-ic-481-Data.db java10968 root 488r REG 8,17 58641524 14158942 /media/datadrive1/dbdatafile/Ks100K/ReverseStringFunction/Ks100K-ReverseStringFunction-ic-481-Index.db java10968 root 489r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db java10968 root 490r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db java10968 root 491r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db java10968 root 492u REG8,1 33554432 29230501 /home/cassandra/deployed/data/cdi.cassandra.cdi/dbcommitlog/CommitLog-2-1372260568134.log java10968 root 493r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db java10968 root 494r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db java10968 root 495r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db java10968 root 497u REG8,1 33554432 29242455 /home/cassandra/deployed/data/cdi.cassandra.cdi/dbcommitlog/CommitLog-2-1372260568126.log java10968 root 498r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db java10968 root 499r REG 8,17 39725539 14160146 /media/datadrive1/dbdatafile/Ks100K/ReverseStringFunction/Ks100K-ReverseStringFunction-ic-1019-Data.db java10968 root 500r REG 8,17 56369841 14160005 /media/datadrive1/dbdatafile/Ks100K/ReverseStringFunction/Ks100K-ReverseStringFunction-ic-1019-Index.db java10968 root 502r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db java10968 root 504r REG 8,17 1989198 14163384 /media/datadrive1/dbdatafile/Ks100K/ReverseStringFunction/Ks100K-ReverseStringFunction-ic-922-Data.db java10968 root 505r REG 8,17 40679209 14161763
Re: Too many open files and stopped compaction with many pending compaction tasks
Are you on SSDs? On 27 Jun 2013, at 14:24, Desimpel, Ignace ignace.desim...@nuance.com wrote: On a test with 3 cassandra servers version 1.2.5 with replication factor 1 and leveled compaction, I did a store last night and I did not see any problem with Cassandra. On all 3 machine the compaction is stopped already several hours. However , one machine reports 650 pending compaction tasks (via jmx). compaction_throughput_mb_per_sec is 0. Concurrent_compactors is 3. multithreaded_compaction = false. No other load on these machines. And when I start querying (using thrift), I get a ’too many open files’ error on the machine with pending compaction tasks. Limits.conf setting for nofile is 65536 Using ‘lsof’ and ‘wc –l’ I get a count of 59577 files for Cassandra. Total count of keyspace files on disk : 20464. The 3 machines have an equal (+/-) data load of about 60 GB. I see that 2 machines have no unleveled or just 1 sstables on any keyspace, but on the machine with troubles there is one keyspace having 670 unleveled sstables. Level sstable histo [670,28,106,14] thus 818 sstables. An ‘ls’ on that directory counts for 5729 files, which corresponds to the 818 sstable (7 files per sstables). After restart of that machine I get 4037 open files for Cassandra. And also compaction has restarted. Once finisched I get SSTableCountPerLEvel = [0,10, 109, 644]. Also, compaction reports speeds of 2.5 MB per sec. Seems slow too me. CPU less than 10%, Disk 15% with peeks to 45% (15000 rpm scsi). 14 GB free memory. So I am puzzled about the number of open files and number of unleveled sstables, and a not so fast compaction. Anything than can be done? Or to be done so that the next time I can get more useful information? Regards, Ignace Example output of lsof is : java10968 root 483r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db java10968 root 484u REG8,1 33554432 29229231 /home/cassandra/deployed/data/cdi.cassandra.cdi/dbcommitlog/CommitLog-2-1372260568123.log java10968 root 485r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db java10968 root 486r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db java10968 root 487r REG 8,17 39967253 14158943 /media/datadrive1/dbdatafile/Ks100K/ReverseStringFunction/Ks100K-ReverseStringFunction-ic-481-Data.db java10968 root 488r REG 8,17 58641524 14158942 /media/datadrive1/dbdatafile/Ks100K/ReverseStringFunction/Ks100K-ReverseStringFunction-ic-481-Index.db java10968 root 489r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db java10968 root 490r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db java10968 root 491r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db java10968 root 492u REG8,1 33554432 29230501 /home/cassandra/deployed/data/cdi.cassandra.cdi/dbcommitlog/CommitLog-2-1372260568134.log java10968 root 493r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db java10968 root 494r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db java10968 root 495r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db java10968 root 497u REG8,1 33554432 29242455 /home/cassandra/deployed/data/cdi.cassandra.cdi/dbcommitlog/CommitLog-2-1372260568126.log java10968 root 498r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db java10968 root 499r REG 8,17 39725539 14160146 /media/datadrive1/dbdatafile/Ks100K/ReverseStringFunction/Ks100K-ReverseStringFunction-ic-1019-Data.db java10968 root 500r REG 8,17 56369841 14160005 /media/datadrive1/dbdatafile/Ks100K/ReverseStringFunction/Ks100K-ReverseStringFunction-ic-1019-Index.db java10968 root 502r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db java10968 root 504r REG 8,17 1989198
RE: Too many open files and stopped compaction with many pending compaction tasks
No : just two 15000 rpm scsi disks per machine. Each disk can handle more than 100MB/sec streaming data (tested). Iostat reports service times of 2 or 3 milli sec. Ubuntu 12.04 LTS 48 GB memory, 24 CPU Xeon X 5670 Cassandra is started with 8GB. -Original Message- From: Jeremy Hanna [mailto:jeremy.hanna1...@gmail.com] Sent: donderdag 27 juni 2013 15:36 To: user@cassandra.apache.org Subject: Re: Too many open files and stopped compaction with many pending compaction tasks Are you on SSDs? On 27 Jun 2013, at 14:24, Desimpel, Ignace ignace.desim...@nuance.com wrote: On a test with 3 cassandra servers version 1.2.5 with replication factor 1 and leveled compaction, I did a store last night and I did not see any problem with Cassandra. On all 3 machine the compaction is stopped already several hours. However , one machine reports 650 pending compaction tasks (via jmx). compaction_throughput_mb_per_sec is 0. Concurrent_compactors is 3. multithreaded_compaction = false. No other load on these machines. And when I start querying (using thrift), I get a 'too many open files' error on the machine with pending compaction tasks. Limits.conf setting for nofile is 65536 Using 'lsof' and 'wc -l' I get a count of 59577 files for Cassandra. Total count of keyspace files on disk : 20464. The 3 machines have an equal (+/-) data load of about 60 GB. I see that 2 machines have no unleveled or just 1 sstables on any keyspace, but on the machine with troubles there is one keyspace having 670 unleveled sstables. Level sstable histo [670,28,106,14] thus 818 sstables. An 'ls' on that directory counts for 5729 files, which corresponds to the 818 sstable (7 files per sstables). After restart of that machine I get 4037 open files for Cassandra. And also compaction has restarted. Once finisched I get SSTableCountPerLEvel = [0,10, 109, 644]. Also, compaction reports speeds of 2.5 MB per sec. Seems slow too me. CPU less than 10%, Disk 15% with peeks to 45% (15000 rpm scsi). 14 GB free memory. So I am puzzled about the number of open files and number of unleveled sstables, and a not so fast compaction. Anything than can be done? Or to be done so that the next time I can get more useful information? Regards, Ignace Example output of lsof is : java10968 root 483r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db java10968 root 484u REG8,1 33554432 29229231 /home/cassandra/deployed/data/cdi.cassandra.cdi/dbcommitlog/CommitLog-2-1372260568123.log java10968 root 485r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db java10968 root 486r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db java10968 root 487r REG 8,17 39967253 14158943 /media/datadrive1/dbdatafile/Ks100K/ReverseStringFunction/Ks100K-ReverseStringFunction-ic-481-Data.db java10968 root 488r REG 8,17 58641524 14158942 /media/datadrive1/dbdatafile/Ks100K/ReverseStringFunction/Ks100K-ReverseStringFunction-ic-481-Index.db java10968 root 489r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db java10968 root 490r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db java10968 root 491r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db java10968 root 492u REG8,1 33554432 29230501 /home/cassandra/deployed/data/cdi.cassandra.cdi/dbcommitlog/CommitLog-2-1372260568134.log java10968 root 493r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db java10968 root 494r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db java10968 root 495r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db java10968 root 497u REG8,1 33554432 29242455 /home/cassandra/deployed/data/cdi.cassandra.cdi/dbcommitlog/CommitLog-2-1372260568126.log java10968 root 498r REG 8,17 10507031 14156174 /media/datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db java10968 root 499r REG 8,17 39725539 14160146 /media/datadrive1
Too Many Open files error
While running the nodetool repair , we are running into FileNotFoundException with too many open files error. We increased the ulimit value to 32768, and still we have seen this issue. THe number of files in the data directory is around 29500+. If we further increase the limit of ulimt, would it help? While tracking the log file for specific file for which it threw the FileNotFoundException, observed that it was part of Compaction. Does it have any thing to do with it? We are using 1.1.4.
Re: Too Many Open files error
This bug is fixed in 1.1.5 Andrey On Thu, Dec 20, 2012 at 12:01 AM, santi kumar santi.ku...@gmail.com wrote: While running the nodetool repair , we are running into FileNotFoundException with too many open files error. We increased the ulimit value to 32768, and still we have seen this issue. THe number of files in the data directory is around 29500+. If we further increase the limit of ulimt, would it help? While tracking the log file for specific file for which it threw the FileNotFoundException, observed that it was part of Compaction. Does it have any thing to do with it? We are using 1.1.4.
Re: Too Many Open files error
Can you please give more details about this bug? bug id or something Now if I want to upgrade, is there any specific process or best practices. Thanks Santi On Thu, Dec 20, 2012 at 1:44 PM, Andrey Ilinykh ailin...@gmail.com wrote: This bug is fixed in 1.1.5 Andrey On Thu, Dec 20, 2012 at 12:01 AM, santi kumar santi.ku...@gmail.comwrote: While running the nodetool repair , we are running into FileNotFoundException with too many open files error. We increased the ulimit value to 32768, and still we have seen this issue. THe number of files in the data directory is around 29500+. If we further increase the limit of ulimt, would it help? While tracking the log file for specific file for which it threw the FileNotFoundException, observed that it was part of Compaction. Does it have any thing to do with it? We are using 1.1.4.
Re: Too Many Open files error
On Thu, Dec 20, 2012 at 1:17 AM, santi kumar santi.ku...@gmail.com wrote: Can you please give more details about this bug? bug id or something https://issues.apache.org/jira/browse/CASSANDRA-4571 Now if I want to upgrade, is there any specific process or best practices. migration from 1.1.4 to 1.1.5 is straightforward- install 1.1.5, stop 1.1.4 (nodetool drain), start 1.1.5 http://www.datastax.com/docs/1.0/install/upgrading#completing-upgrade Andrey Thanks Santi On Thu, Dec 20, 2012 at 1:44 PM, Andrey Ilinykh ailin...@gmail.comwrote: This bug is fixed in 1.1.5 Andrey On Thu, Dec 20, 2012 at 12:01 AM, santi kumar santi.ku...@gmail.comwrote: While running the nodetool repair , we are running into FileNotFoundException with too many open files error. We increased the ulimit value to 32768, and still we have seen this issue. THe number of files in the data directory is around 29500+. If we further increase the limit of ulimt, would it help? While tracking the log file for specific file for which it threw the FileNotFoundException, observed that it was part of Compaction. Does it have any thing to do with it? We are using 1.1.4.
Re: Too Many Open files error
THe number of files in the data directory is around 29500+. If you are using Levelled Compaction it is probably easier to set the ulimit to unlimited. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 21/12/2012, at 6:34 AM, Andrey Ilinykh ailin...@gmail.com wrote: On Thu, Dec 20, 2012 at 1:17 AM, santi kumar santi.ku...@gmail.com wrote: Can you please give more details about this bug? bug id or something https://issues.apache.org/jira/browse/CASSANDRA-4571 Now if I want to upgrade, is there any specific process or best practices. migration from 1.1.4 to 1.1.5 is straightforward- install 1.1.5, stop 1.1.4 (nodetool drain), start 1.1.5 http://www.datastax.com/docs/1.0/install/upgrading#completing-upgrade Andrey Thanks Santi On Thu, Dec 20, 2012 at 1:44 PM, Andrey Ilinykh ailin...@gmail.com wrote: This bug is fixed in 1.1.5 Andrey On Thu, Dec 20, 2012 at 12:01 AM, santi kumar santi.ku...@gmail.com wrote: While running the nodetool repair , we are running into FileNotFoundException with too many open files error. We increased the ulimit value to 32768, and still we have seen this issue. THe number of files in the data directory is around 29500+. If we further increase the limit of ulimt, would it help? While tracking the log file for specific file for which it threw the FileNotFoundException, observed that it was part of Compaction. Does it have any thing to do with it? We are using 1.1.4.
Re: cassandra hit a wall: Too many open files (98567!)
Ah, that explains part of the problem indeed. The whole situation still doesn't make a lot of sense to me, unless the answer is that the default sstable size with level compaction is just no good for large datasets. I restarted cassandra a few hours ago and it had to open about 32k files at start-up. Took about 15 minutes. That just can't be good... I also noticed that when using compression the sstable size specified is uncompressed, so the actual files tend to be smaller. I now upped the sstable size to 100MB, which should result in about 40MB files in my case. Is there a way I can compact some of the existing sstables that are small? For example, I have a level-4 sstable that is 56KB in size and many more that are rather small. Does nodetool compact do anything with level compaction? On 1/18/2012 2:39 AM, Janne Jalkanen wrote: 1.0.6 has a file leak problem, fixed in 1.0.7. Perhaps this is the reason? https://issues.apache.org/jira/browse/CASSANDRA-3616 /Janne On Jan 18, 2012, at 03:52 , dir dir wrote: Very Interesting Why you open so many file? Actually what kind of system that is built by you until open so many files? would you tell us? Thanks... On Sat, Jan 14, 2012 at 2:01 AM, Thorsten von Eicken t...@rightscale.com mailto:t...@rightscale.com wrote: I'm running a single node cassandra 1.0.6 server which hit a wall yesterday: ERROR [CompactionExecutor:2918] 2012-01-12 20 tel:2012-01-12%2020:37:06,327 AbstractCassandraDaemon.java (line 133) Fatal exception in thread Thread[CompactionExecutor:2918,1,main] java.io.IOError: java.io.FileNotFoundException: /mnt/ebs/data/rslog_production/req_word_idx-hc-453661-Data.db (Too many open files in system) After that it stopped working and just say there with this error (undestandable). I did an lsof and saw that it had 98567 open files, yikes! An ls in the data directory shows 234011 files. After restarting it spent about 5 hours compacting, then quieted down. About 173k files left in the data directory. I'm using leveldb (with compression). I looked into the json of the two large CFs and gen 0 is empty, most sstables are gen 3 4. I have a total of about 150GB of data (compressed). Almost all the SStables are around 3MB in size. Aren't they supposed to get 10x bigger at higher gen's? This situation can't be healthy, can it? Suggestions?
Re: cassandra hit a wall: Too many open files (98567!)
On Fri, Jan 13, 2012 at 8:01 PM, Thorsten von Eicken t...@rightscale.com wrote: I'm running a single node cassandra 1.0.6 server which hit a wall yesterday: ERROR [CompactionExecutor:2918] 2012-01-12 20:37:06,327 AbstractCassandraDaemon.java (line 133) Fatal exception in thread Thread[CompactionExecutor:2918,1,main] java.io.IOError: java.io.FileNotFoundException: /mnt/ebs/data/rslog_production/req_word_idx-hc-453661-Data.db (Too many open files in system) After that it stopped working and just say there with this error (undestandable). I did an lsof and saw that it had 98567 open files, yikes! An ls in the data directory shows 234011 files. After restarting it spent about 5 hours compacting, then quieted down. About 173k files left in the data directory. I'm using leveldb (with compression). I looked into the json of the two large CFs and gen 0 is empty, most sstables are gen 3 4. I have a total of about 150GB of data (compressed). Almost all the SStables are around 3MB in size. Aren't they supposed to get 10x bigger at higher gen's? No, with leveled compaction, the (max) size of sstables is fixed whatever the generation is (the default is 5MB, but it's 5MB of uncompressed data (we may change that though) so 3MB sound about right). What changes between generations is the number of sstables it can contain. Gen 1 can have 10 sstables (it can have more but only temporarily), Gen 2 can have 100, Gen 3 can have 1000 etc.. So again, that most sstables are in gen 3 and 4 is expected too. This situation can't be healthy, can it? Suggestions? Leveled compaction uses lots of files (the number is proportional to the amount of data). It is not necessarily a big problem as modern OS deal wit big amount of open files fairly well (as far as I know at least). I would just up the file descriptor ulimit and not worry too much about it, unless you have reasons to believe that it's an actual descriptor leak (but given the number of files you have, the number of open ones doesn't seem off so I don't think there is one here) or that this has performance impacts. -- Sylvain
Re: cassandra hit a wall: Too many open files (98567!)
1.0.6 has a file leak problem, fixed in 1.0.7. Perhaps this is the reason? https://issues.apache.org/jira/browse/CASSANDRA-3616 /Janne On Jan 18, 2012, at 03:52 , dir dir wrote: Very Interesting Why you open so many file? Actually what kind of system that is built by you until open so many files? would you tell us? Thanks... On Sat, Jan 14, 2012 at 2:01 AM, Thorsten von Eicken t...@rightscale.com wrote: I'm running a single node cassandra 1.0.6 server which hit a wall yesterday: ERROR [CompactionExecutor:2918] 2012-01-12 20:37:06,327 AbstractCassandraDaemon.java (line 133) Fatal exception in thread Thread[CompactionExecutor:2918,1,main] java.io.IOError: java.io.FileNotFoundException: /mnt/ebs/data/rslog_production/req_word_idx-hc-453661-Data.db (Too many open files in system) After that it stopped working and just say there with this error (undestandable). I did an lsof and saw that it had 98567 open files, yikes! An ls in the data directory shows 234011 files. After restarting it spent about 5 hours compacting, then quieted down. About 173k files left in the data directory. I'm using leveldb (with compression). I looked into the json of the two large CFs and gen 0 is empty, most sstables are gen 3 4. I have a total of about 150GB of data (compressed). Almost all the SStables are around 3MB in size. Aren't they supposed to get 10x bigger at higher gen's? This situation can't be healthy, can it? Suggestions?
Re: cassandra hit a wall: Too many open files (98567!)
Very Interesting Why you open so many file? Actually what kind of system that is built by you until open so many files? would you tell us? Thanks... On Sat, Jan 14, 2012 at 2:01 AM, Thorsten von Eicken t...@rightscale.comwrote: I'm running a single node cassandra 1.0.6 server which hit a wall yesterday: ERROR [CompactionExecutor:2918] 2012-01-12 20:37:06,327 AbstractCassandraDaemon.java (line 133) Fatal exception in thread Thread[CompactionExecutor:2918,1,main] java.io.IOError: java.io.FileNotFoundException: /mnt/ebs/data/rslog_production/req_word_idx-hc-453661-Data.db (Too many open files in system) After that it stopped working and just say there with this error (undestandable). I did an lsof and saw that it had 98567 open files, yikes! An ls in the data directory shows 234011 files. After restarting it spent about 5 hours compacting, then quieted down. About 173k files left in the data directory. I'm using leveldb (with compression). I looked into the json of the two large CFs and gen 0 is empty, most sstables are gen 3 4. I have a total of about 150GB of data (compressed). Almost all the SStables are around 3MB in size. Aren't they supposed to get 10x bigger at higher gen's? This situation can't be healthy, can it? Suggestions?
Re: cassandra hit a wall: Too many open files (98567!)
That sounds like to many sstables. Out of interest were you using multi threaded compaction ? Just wondering about this https://issues.apache.org/jira/browse/CASSANDRA-3711 Can you set the file handles to unlimited ? Can you provide some more info what your see in the data dir incase it is a bug in leveled compaction. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 14/01/2012, at 8:01 AM, Thorsten von Eicken wrote: I'm running a single node cassandra 1.0.6 server which hit a wall yesterday: ERROR [CompactionExecutor:2918] 2012-01-12 20:37:06,327 AbstractCassandraDaemon.java (line 133) Fatal exception in thread Thread[CompactionExecutor:2918,1,main] java.io.IOError: java.io.FileNotFoundException: /mnt/ebs/data/rslog_production/req_word_idx-hc-453661-Data.db (Too many open files in system) After that it stopped working and just say there with this error (undestandable). I did an lsof and saw that it had 98567 open files, yikes! An ls in the data directory shows 234011 files. After restarting it spent about 5 hours compacting, then quieted down. About 173k files left in the data directory. I'm using leveldb (with compression). I looked into the json of the two large CFs and gen 0 is empty, most sstables are gen 3 4. I have a total of about 150GB of data (compressed). Almost all the SStables are around 3MB in size. Aren't they supposed to get 10x bigger at higher gen's? This situation can't be healthy, can it? Suggestions?
cassandra hit a wall: Too many open files (98567!)
I'm running a single node cassandra 1.0.6 server which hit a wall yesterday: ERROR [CompactionExecutor:2918] 2012-01-12 20:37:06,327 AbstractCassandraDaemon.java (line 133) Fatal exception in thread Thread[CompactionExecutor:2918,1,main] java.io.IOError: java.io.FileNotFoundException: /mnt/ebs/data/rslog_production/req_word_idx-hc-453661-Data.db (Too many open files in system) After that it stopped working and just say there with this error (undestandable). I did an lsof and saw that it had 98567 open files, yikes! An ls in the data directory shows 234011 files. After restarting it spent about 5 hours compacting, then quieted down. About 173k files left in the data directory. I'm using leveldb (with compression). I looked into the json of the two large CFs and gen 0 is empty, most sstables are gen 3 4. I have a total of about 150GB of data (compressed). Almost all the SStables are around 3MB in size. Aren't they supposed to get 10x bigger at higher gen's? This situation can't be healthy, can it? Suggestions?
Too many open files
All: What does the following error mean? One of my cassandra servers print this error, and nodetool shows the state of the server is down. Netstat result shows the socket number is very few. WARN [main] 2011-07-27 16:14:04,872 CustomTThreadPoolServer.java (line 104) Transport error occurred during acceptance of message. org.apache.thrift.transport.TTransportException: java.net.SocketException: Too many open files at org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java: 124) at org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java: 35) at org.apache.thrift.transport.TServerTransport.accept(TServerTransport.jav a:31) at org.apache.cassandra.thrift.CustomTThreadPoolServer.serve(CustomTThreadP oolServer.java:98) at org.apache.cassandra.thrift.CassandraDaemon.start(CassandraDaemon.java:1 83) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:22 4) Caused by: java.net.SocketException: Too many open files at java.net.PlainSocketImpl.socketAccept(Native Method) at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:390) at java.net.ServerSocket.implAccept(ServerSocket.java:453) at java.net.ServerSocket.accept(ServerSocket.java:421) at org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java: 119) ... 5 more Best Regards Donna li
Re: Too many open files
What does the following error mean? One of my cassandra servers print this error, and nodetool shows the state of the server is down. Netstat result shows the socket number is very few. The operating system enforced limits have been hit, so Cassandra is unable to create additional file descriptors (so it can't open files, TCP connections, etc). The correct fix is to ensure that Cassandra is running with higher operating system enforced limits (see ulimit, /etc/security/limits.conf, etc). Cassandra is not expected to deal with this type of error gracefully and you will want to restart nodes that run into this. -- / Peter Schuller (@scode on twitter)
Re: Too many open files during Repair operation
I'm guessing you've seen this already? http://www.datastax.com/docs/0.8/troubleshooting/index#java-reports-an-error-saying-there-are-too-many-open-files Check out the # of File Descriptors opened with the lsof- -n | grep java command. On Tue, Jul 19, 2011 at 8:30 AM, cbert...@libero.it cbert...@libero.itwrote: Hi all. In production we want to run nodetool repair but each time we do it we get the too many open files error. We've increased the number of available FD for Cassandra till 8192 but still we get the same error after few seconds. Should I increase it more? WARN [Thread-7] 2011-07-19 12:34:00,348 CustomTThreadPoolServer.java (line 131) Transport error occurred during acceptance of message. org.apache.thrift.transport.TTransportException: java.net.SocketException: Too many open files at org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket. java:124) at org.apache.cassandra.thrift.TCustomServerSocket.acceptImpl (TCustomServerSocket.java:68) at org.apache.cassandra.thrift.TCustomServerSocket.acceptImpl (TCustomServerSocket.java:39) at org.apache.thrift.transport.TServerTransport.accept (TServerTransport.java:31) at org.apache.cassandra.thrift.CustomTThreadPoolServer.serve (CustomTThreadPoolServer.java:121) at org.apache.cassandra.thrift.CassandraDaemon$ThriftServer.run (CassandraDaemon.java:155) Caused by: java.net.SocketException: Too many open files at java.net.PlainSocketImpl.socketAccept(Native Method) at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:408) at java.net.ServerSocket.implAccept(ServerSocket.java:462) at java.net.ServerSocket.accept(ServerSocket.java:430) at org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket. java:119) ... 5 more nodetool repair keyspacename -h host Cassandra 0.7.5, 1 cluster, 5 nodes. Each node give the same output. One more question: when repair start throwing this kind of exceptions (very fast) we stop the process of repair ... is it dangerous for data? Best Regards Carlo
Re: Too many open files during Repair operation
If you are using Linux, especially Ubuntu, check the linked document below. This is my favorite: Using sudo has side effects in terms of open file limits. On Ubuntu they’ll be reset to 1024, no matter what’s set in /etc/security/limits.conf http://wiki.basho.com/Open-Files-Limit.html /Attila
Re: too many open files - maybe a fd leak in indexslicequeries
sounds like they haven't been munmapped yet. try forcing a GC. On Sat, Apr 2, 2011 at 5:38 AM, Roland Gude roland.g...@yoochoose.com wrote: Hi, The open file limit is 1024 Sstable count is somewhere around 20 or so thread count is in the same order of magnitude I guess But lsof shows that deleted sstables still have open file handles. This seems to be the issue as this number keeps growing. Any ideas? Roland. -Ursprüngliche Nachricht- Von: Jonathan Ellis [mailto:jbel...@gmail.com] Gesendet: Freitag, 1. April 2011 06:07 An: user@cassandra.apache.org Cc: Roland Gude; Juergen Link; Johannes Hoerle Betreff: Re: too many open files - maybe a fd leak in indexslicequeries Index queries (ColumnFamilyStore.scan) don't do any low-level i/o themselves, they go through CFS.getColumnFamily, which is what normal row fetches also go through. So if there is a leak there it's unlikely to be specific to indexes. What is your open-file limit (remember that sockets count towards this), thread count, sstable count? On Thu, Mar 31, 2011 at 4:15 PM, Roland Gude roland.g...@yoochoose.com wrote: I experience something that looks exactly like https://issues.apache.org/jira/browse/CASSANDRA-1178 On cassandra 0.7.3 when using index slice queries (lots of them) Crashing multiple nodes and rendering the cluster useless. But I have no clue where to look if index queries still leak fd Does anybody know about it? Where could I look? Greetings, roland -- YOOCHOOSE GmbH Roland Gude Software Engineer Im Mediapark 8, 50670 Köln +49 221 4544151 (Tel) +49 221 4544159 (Fax) +49 171 7894057 (Mobil) Email: roland.g...@yoochoose.com WWW: www.yoochoose.com YOOCHOOSE GmbH Geschäftsführer: Dr. Uwe Alkemper, Michael Friedmann Handelsregister: Amtsgericht Köln HRB 65275 Ust-Ident-Nr: DE 264 773 520 Sitz der Gesellschaft: Köln -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
AW: too many open files - maybe a fd leak in indexslicequeries
Hi, The open file limit is 1024 Sstable count is somewhere around 20 or so thread count is in the same order of magnitude I guess But lsof shows that deleted sstables still have open file handles. This seems to be the issue as this number keeps growing. Any ideas? Roland. -Ursprüngliche Nachricht- Von: Jonathan Ellis [mailto:jbel...@gmail.com] Gesendet: Freitag, 1. April 2011 06:07 An: user@cassandra.apache.org Cc: Roland Gude; Juergen Link; Johannes Hoerle Betreff: Re: too many open files - maybe a fd leak in indexslicequeries Index queries (ColumnFamilyStore.scan) don't do any low-level i/o themselves, they go through CFS.getColumnFamily, which is what normal row fetches also go through. So if there is a leak there it's unlikely to be specific to indexes. What is your open-file limit (remember that sockets count towards this), thread count, sstable count? On Thu, Mar 31, 2011 at 4:15 PM, Roland Gude roland.g...@yoochoose.com wrote: I experience something that looks exactly like https://issues.apache.org/jira/browse/CASSANDRA-1178 On cassandra 0.7.3 when using index slice queries (lots of them) Crashing multiple nodes and rendering the cluster useless. But I have no clue where to look if index queries still leak fd Does anybody know about it? Where could I look? Greetings, roland -- YOOCHOOSE GmbH Roland Gude Software Engineer Im Mediapark 8, 50670 Köln +49 221 4544151 (Tel) +49 221 4544159 (Fax) +49 171 7894057 (Mobil) Email: roland.g...@yoochoose.com WWW: www.yoochoose.com YOOCHOOSE GmbH Geschäftsführer: Dr. Uwe Alkemper, Michael Friedmann Handelsregister: Amtsgericht Köln HRB 65275 Ust-Ident-Nr: DE 264 773 520 Sitz der Gesellschaft: Köln -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
too many open files - maybe a fd leak in indexslicequeries
I experience something that looks exactly like https://issues.apache.org/jira/browse/CASSANDRA-1178 On cassandra 0.7.3 when using index slice queries (lots of them) Crashing multiple nodes and rendering the cluster useless. But I have no clue where to look if index queries still leak fd Does anybody know about it? Where could I look? Greetings, roland -- YOOCHOOSE GmbH Roland Gude Software Engineer Im Mediapark 8, 50670 Köln +49 221 4544151 (Tel) +49 221 4544159 (Fax) +49 171 7894057 (Mobil) Email: roland.g...@yoochoose.com WWW: www.yoochoose.comhttp://www.yoochoose.com/ YOOCHOOSE GmbH Geschäftsführer: Dr. Uwe Alkemper, Michael Friedmann Handelsregister: Amtsgericht Köln HRB 65275 Ust-Ident-Nr: DE 264 773 520 Sitz der Gesellschaft: Köln
Re: too many open files - maybe a fd leak in indexslicequeries
Index queries (ColumnFamilyStore.scan) don't do any low-level i/o themselves, they go through CFS.getColumnFamily, which is what normal row fetches also go through. So if there is a leak there it's unlikely to be specific to indexes. What is your open-file limit (remember that sockets count towards this), thread count, sstable count? On Thu, Mar 31, 2011 at 4:15 PM, Roland Gude roland.g...@yoochoose.com wrote: I experience something that looks exactly like https://issues.apache.org/jira/browse/CASSANDRA-1178 On cassandra 0.7.3 when using index slice queries (lots of them) Crashing multiple nodes and rendering the cluster useless. But I have no clue where to look if index queries still leak fd Does anybody know about it? Where could I look? Greetings, roland -- YOOCHOOSE GmbH Roland Gude Software Engineer Im Mediapark 8, 50670 Köln +49 221 4544151 (Tel) +49 221 4544159 (Fax) +49 171 7894057 (Mobil) Email: roland.g...@yoochoose.com WWW: www.yoochoose.com YOOCHOOSE GmbH Geschäftsführer: Dr. Uwe Alkemper, Michael Friedmann Handelsregister: Amtsgericht Köln HRB 65275 Ust-Ident-Nr: DE 264 773 520 Sitz der Gesellschaft: Köln -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Too many open files Exception + java.lang.ArithmeticException: / by zero
I increased the amount of the allowed file descriptors to unlimted. Now, I get exactly the same exception after 3.50 rows : *CustomTThreadPoolServer.java (line 104) Transport error occurred during acceptance of message.* *org.apache.thrift.transport.TTransportException: java.net.SocketException: Too many open files* * * What worries me is this / by zero exception when I try to restart cassandra ! At least, I want to backup the 3.50 rows to continue then my insertion, is there a way to do this? * Exception encountered during startup. java.lang.ArithmeticException: / by zero at org.apache.cassandra.io.sstable.SSTable.estimateRowsFromIndex(SSTable.java:233) * Thanks. * * 2010/12/15 Jake Luciani jak...@gmail.com http://www.riptano.com/docs/0.6/troubleshooting/index#java-reports-an-error-saying-there-are-too-many-open-files On Wed, Dec 15, 2010 at 11:13 AM, Amin Sakka, Novapost amin.sa...@novapost.fr wrote: *Hello,* *I'm using cassandra 0.7.0 rc1, a single node configuration, replication factor 1, random partitioner, 2 GO heap size.* *I ran my hector client to insert 5.000.000 rows but after a couple of hours, the following Exception occurs : * WARN [main] 2010-12-15 16:38:53,335 CustomTThreadPoolServer.java (line 104) Transport error occurred during acceptance of message. org.apache.thrift.transport.TTransportException: java.net.SocketException: Too many open files at org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:124) at org.apache.cassandra.thrift.TCustomServerSocket.acceptImpl(TCustomServerSocket.java:67) at org.apache.cassandra.thrift.TCustomServerSocket.acceptImpl(TCustomServerSocket.java:38) at org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:31) at org.apache.cassandra.thrift.CustomTThreadPoolServer.serve(CustomTThreadPoolServer.java:98) at org.apache.cassandra.thrift.CassandraDaemon.start(CassandraDaemon.java:120) at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:229) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:134) Caused by: java.net.SocketException: Too many open files at java.net.PlainSocketImpl.socketAccept(Native Method) at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:384) at java.net.ServerSocket.implAccept(ServerSocket.java:453) at java.net.ServerSocket.accept(ServerSocket.java:421) at org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:119) *When I try to restart Cassandra, I have the following exception :* ERROR 16:42:26,573 Exception encountered during startup. java.lang.ArithmeticException: / by zero at org.apache.cassandra.io.sstable.SSTable.estimateRowsFromIndex(SSTable.java:233) at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:284) at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:200) at org.apache.cassandra.db.ColumnFamilyStore.init(ColumnFamilyStore.java:225) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:449) at org.apache.cassandra.db.ColumnFamilyStore.addIndex(ColumnFamilyStore.java:306) at org.apache.cassandra.db.ColumnFamilyStore.init(ColumnFamilyStore.java:246) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:449) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:437) at org.apache.cassandra.db.Table.initCf(Table.java:341) at org.apache.cassandra.db.Table.init(Table.java:283) at org.apache.cassandra.db.Table.open(Table.java:114) at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:138) at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:55) at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:216) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:134) I am looking for advice on how to debug this. Thanks, -- Amin -- Amin SAKKA Research and Development Engineer 32 rue de Paradis, 75010 Paris *Tel:* +33 (0)6 34 14 19 25 *Mail:* amin.sa...@novapost.fr *Web:* www.novapost.fr / www.novapost-rh.fr
Re: Too many open files Exception + java.lang.ArithmeticException: / by zero
Be careful with the unlimited value on ulimit, you could end up with a unresponsive server... I mean, you could not even connect via ssh if you don't have enough handles. On Thu, Dec 16, 2010 at 9:59 AM, Amin Sakka, Novapost amin.sa...@novapost.fr wrote: I increased the amount of the allowed file descriptors to unlimted. Now, I get exactly the same exception after 3.50 rows : *CustomTThreadPoolServer.java (line 104) Transport error occurred during acceptance of message.* *org.apache.thrift.transport.TTransportException: java.net.SocketException: Too many open files* * * What worries me is this / by zero exception when I try to restart cassandra ! At least, I want to backup the 3.50 rows to continue then my insertion, is there a way to do this? * Exception encountered during startup. java.lang.ArithmeticException: / by zero at org.apache.cassandra.io.sstable.SSTable.estimateRowsFromIndex(SSTable.java:233) * Thanks. * * 2010/12/15 Jake Luciani jak...@gmail.com http://www.riptano.com/docs/0.6/troubleshooting/index#java-reports-an-error-saying-there-are-too-many-open-files On Wed, Dec 15, 2010 at 11:13 AM, Amin Sakka, Novapost amin.sa...@novapost.fr wrote: *Hello,* *I'm using cassandra 0.7.0 rc1, a single node configuration, replication factor 1, random partitioner, 2 GO heap size.* *I ran my hector client to insert 5.000.000 rows but after a couple of hours, the following Exception occurs : * WARN [main] 2010-12-15 16:38:53,335 CustomTThreadPoolServer.java (line 104) Transport error occurred during acceptance of message. org.apache.thrift.transport.TTransportException: java.net.SocketException: Too many open files at org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:124) at org.apache.cassandra.thrift.TCustomServerSocket.acceptImpl(TCustomServerSocket.java:67) at org.apache.cassandra.thrift.TCustomServerSocket.acceptImpl(TCustomServerSocket.java:38) at org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:31) at org.apache.cassandra.thrift.CustomTThreadPoolServer.serve(CustomTThreadPoolServer.java:98) at org.apache.cassandra.thrift.CassandraDaemon.start(CassandraDaemon.java:120) at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:229) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:134) Caused by: java.net.SocketException: Too many open files at java.net.PlainSocketImpl.socketAccept(Native Method) at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:384) at java.net.ServerSocket.implAccept(ServerSocket.java:453) at java.net.ServerSocket.accept(ServerSocket.java:421) at org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:119) *When I try to restart Cassandra, I have the following exception :* ERROR 16:42:26,573 Exception encountered during startup. java.lang.ArithmeticException: / by zero at org.apache.cassandra.io.sstable.SSTable.estimateRowsFromIndex(SSTable.java:233) at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:284) at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:200) at org.apache.cassandra.db.ColumnFamilyStore.init(ColumnFamilyStore.java:225) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:449) at org.apache.cassandra.db.ColumnFamilyStore.addIndex(ColumnFamilyStore.java:306) at org.apache.cassandra.db.ColumnFamilyStore.init(ColumnFamilyStore.java:246) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:449) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:437) at org.apache.cassandra.db.Table.initCf(Table.java:341) at org.apache.cassandra.db.Table.init(Table.java:283) at org.apache.cassandra.db.Table.open(Table.java:114) at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:138) at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:55) at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:216) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:134) I am looking for advice on how to debug this. Thanks, -- Amin -- Amin SAKKA Research and Development Engineer 32 rue de Paradis, 75010 Paris *Tel:* +33 (0)6 34 14 19 25 *Mail:* amin.sa...@novapost.fr *Web:* www.novapost.fr / www.novapost-rh.fr -- //GK german.kond...@gmail.com // sites http://twitter.com/germanklf http://www.facebook.com/germanklf http://ar.linkedin.com/in/germankondolf
Re: Too many open files Exception + java.lang.ArithmeticException: / by zero
how many sstable Data.db files do you see in your system and how big are they? Also, how big are the rows you are inserting? On Thu, Dec 16, 2010 at 7:59 AM, Amin Sakka, Novapost amin.sa...@novapost.fr wrote: I increased the amount of the allowed file descriptors to unlimted. Now, I get exactly the same exception after 3.50 rows : *CustomTThreadPoolServer.java (line 104) Transport error occurred during acceptance of message.* *org.apache.thrift.transport.TTransportException: java.net.SocketException: Too many open files* * * What worries me is this / by zero exception when I try to restart cassandra ! At least, I want to backup the 3.50 rows to continue then my insertion, is there a way to do this? * Exception encountered during startup. java.lang.ArithmeticException: / by zero at org.apache.cassandra.io.sstable.SSTable.estimateRowsFromIndex(SSTable.java:233) * Thanks. * * 2010/12/15 Jake Luciani jak...@gmail.com http://www.riptano.com/docs/0.6/troubleshooting/index#java-reports-an-error-saying-there-are-too-many-open-files On Wed, Dec 15, 2010 at 11:13 AM, Amin Sakka, Novapost amin.sa...@novapost.fr wrote: *Hello,* *I'm using cassandra 0.7.0 rc1, a single node configuration, replication factor 1, random partitioner, 2 GO heap size.* *I ran my hector client to insert 5.000.000 rows but after a couple of hours, the following Exception occurs : * WARN [main] 2010-12-15 16:38:53,335 CustomTThreadPoolServer.java (line 104) Transport error occurred during acceptance of message. org.apache.thrift.transport.TTransportException: java.net.SocketException: Too many open files at org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:124) at org.apache.cassandra.thrift.TCustomServerSocket.acceptImpl(TCustomServerSocket.java:67) at org.apache.cassandra.thrift.TCustomServerSocket.acceptImpl(TCustomServerSocket.java:38) at org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:31) at org.apache.cassandra.thrift.CustomTThreadPoolServer.serve(CustomTThreadPoolServer.java:98) at org.apache.cassandra.thrift.CassandraDaemon.start(CassandraDaemon.java:120) at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:229) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:134) Caused by: java.net.SocketException: Too many open files at java.net.PlainSocketImpl.socketAccept(Native Method) at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:384) at java.net.ServerSocket.implAccept(ServerSocket.java:453) at java.net.ServerSocket.accept(ServerSocket.java:421) at org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:119) *When I try to restart Cassandra, I have the following exception :* ERROR 16:42:26,573 Exception encountered during startup. java.lang.ArithmeticException: / by zero at org.apache.cassandra.io.sstable.SSTable.estimateRowsFromIndex(SSTable.java:233) at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:284) at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:200) at org.apache.cassandra.db.ColumnFamilyStore.init(ColumnFamilyStore.java:225) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:449) at org.apache.cassandra.db.ColumnFamilyStore.addIndex(ColumnFamilyStore.java:306) at org.apache.cassandra.db.ColumnFamilyStore.init(ColumnFamilyStore.java:246) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:449) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:437) at org.apache.cassandra.db.Table.initCf(Table.java:341) at org.apache.cassandra.db.Table.init(Table.java:283) at org.apache.cassandra.db.Table.open(Table.java:114) at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:138) at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:55) at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:216) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:134) I am looking for advice on how to debug this. Thanks, -- Amin -- Amin SAKKA Research and Development Engineer 32 rue de Paradis, 75010 Paris *Tel:* +33 (0)6 34 14 19 25 *Mail:* amin.sa...@novapost.fr *Web:* www.novapost.fr / www.novapost-rh.fr
Re: Too many open files Exception + java.lang.ArithmeticException: / by zero
Are you creating a new connection for each row you insert (and if so are you closing it)? -ryan On Wed, Dec 15, 2010 at 8:13 AM, Amin Sakka, Novapost amin.sa...@novapost.fr wrote: Hello, I'm using cassandra 0.7.0 rc1, a single node configuration, replication factor 1, random partitioner, 2 GO heap size. I ran my hector client to insert 5.000.000 rows but after a couple of hours, the following Exception occurs : WARN [main] 2010-12-15 16:38:53,335 CustomTThreadPoolServer.java (line 104) Transport error occurred during acceptance of message. org.apache.thrift.transport.TTransportException: java.net.SocketException: Too many open files at org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:124) at org.apache.cassandra.thrift.TCustomServerSocket.acceptImpl(TCustomServerSocket.java:67) at org.apache.cassandra.thrift.TCustomServerSocket.acceptImpl(TCustomServerSocket.java:38) at org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:31) at org.apache.cassandra.thrift.CustomTThreadPoolServer.serve(CustomTThreadPoolServer.java:98) at org.apache.cassandra.thrift.CassandraDaemon.start(CassandraDaemon.java:120) at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:229) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:134) Caused by: java.net.SocketException: Too many open files at java.net.PlainSocketImpl.socketAccept(Native Method) at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:384) at java.net.ServerSocket.implAccept(ServerSocket.java:453) at java.net.ServerSocket.accept(ServerSocket.java:421) at org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:119) When I try to restart Cassandra, I have the following exception : ERROR 16:42:26,573 Exception encountered during startup. java.lang.ArithmeticException: / by zero at org.apache.cassandra.io.sstable.SSTable.estimateRowsFromIndex(SSTable.java:233) at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:284) at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:200) at org.apache.cassandra.db.ColumnFamilyStore.init(ColumnFamilyStore.java:225) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:449) at org.apache.cassandra.db.ColumnFamilyStore.addIndex(ColumnFamilyStore.java:306) at org.apache.cassandra.db.ColumnFamilyStore.init(ColumnFamilyStore.java:246) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:449) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:437) at org.apache.cassandra.db.Table.initCf(Table.java:341) at org.apache.cassandra.db.Table.init(Table.java:283) at org.apache.cassandra.db.Table.open(Table.java:114) at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:138) at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:55) at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:216) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:134) I am looking for advice on how to debug this. Thanks, -- Amin
Re: Too many open files Exception + java.lang.ArithmeticException: / by zero
You probably want to switch to using mutator#addInsertion for some number of iterations (start with 1000 and adjust as needed), then calling execute(). This will be much more efficient. On Thu, Dec 16, 2010 at 11:39 AM, Amin Sakka, Novapost amin.sa...@novapost.fr wrote: I'm using a unique client instance (using Hector) and a unique connection to cassandra. For each insertion I'm using a new mutator and then I release it. I have 473 sstable Data.db, the average size of each is 30Mo. 2010/12/16 Ryan King r...@twitter.com Are you creating a new connection for each row you insert (and if so are you closing it)? -ryan On Wed, Dec 15, 2010 at 8:13 AM, Amin Sakka, Novapost amin.sa...@novapost.fr wrote: Hello, I'm using cassandra 0.7.0 rc1, a single node configuration, replication factor 1, random partitioner, 2 GO heap size. I ran my hector client to insert 5.000.000 rows but after a couple of hours, the following Exception occurs : WARN [main] 2010-12-15 16:38:53,335 CustomTThreadPoolServer.java (line 104) Transport error occurred during acceptance of message. org.apache.thrift.transport.TTransportException: java.net.SocketException: Too many open files at org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:124) at org.apache.cassandra.thrift.TCustomServerSocket.acceptImpl(TCustomServerSocket.java:67) at org.apache.cassandra.thrift.TCustomServerSocket.acceptImpl(TCustomServerSocket.java:38) at org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:31) at org.apache.cassandra.thrift.CustomTThreadPoolServer.serve(CustomTThreadPoolServer.java:98) at org.apache.cassandra.thrift.CassandraDaemon.start(CassandraDaemon.java:120) at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:229) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:134) Caused by: java.net.SocketException: Too many open files at java.net.PlainSocketImpl.socketAccept(Native Method) at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:384) at java.net.ServerSocket.implAccept(ServerSocket.java:453) at java.net.ServerSocket.accept(ServerSocket.java:421) at org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:119) When I try to restart Cassandra, I have the following exception : ERROR 16:42:26,573 Exception encountered during startup. java.lang.ArithmeticException: / by zero at org.apache.cassandra.io.sstable.SSTable.estimateRowsFromIndex(SSTable.java:233) at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:284) at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:200) at org.apache.cassandra.db.ColumnFamilyStore.init(ColumnFamilyStore.java:225) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:449) at org.apache.cassandra.db.ColumnFamilyStore.addIndex(ColumnFamilyStore.java:306) at org.apache.cassandra.db.ColumnFamilyStore.init(ColumnFamilyStore.java:246) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:449) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:437) at org.apache.cassandra.db.Table.initCf(Table.java:341) at org.apache.cassandra.db.Table.init(Table.java:283) at org.apache.cassandra.db.Table.open(Table.java:114) at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:138) at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:55) at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:216) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:134) I am looking for advice on how to debug this. Thanks, -- Amin -- Amin
Re: Too many open files Exception + java.lang.ArithmeticException: / by zero
Indeed Hector has a connection pool behind it, I think it uses 50 connectios per node. But also uses a node to discover the others, I assume that, as I saw connections from my app to nodes that I didn't configure in Hector. So, you may check the fds in OS level to see if there is a bottleneck there. On Thu, Dec 16, 2010 at 2:39 PM, Amin Sakka, Novapost amin.sa...@novapost.fr wrote: I'm using a unique client instance (using Hector) and a unique connection to cassandra. For each insertion I'm using a new mutator and then I release it. I have 473 sstable Data.db, the average size of each is 30Mo. 2010/12/16 Ryan King r...@twitter.com Are you creating a new connection for each row you insert (and if so are you closing it)? -ryan On Wed, Dec 15, 2010 at 8:13 AM, Amin Sakka, Novapost amin.sa...@novapost.fr wrote: Hello, I'm using cassandra 0.7.0 rc1, a single node configuration, replication factor 1, random partitioner, 2 GO heap size. I ran my hector client to insert 5.000.000 rows but after a couple of hours, the following Exception occurs : WARN [main] 2010-12-15 16:38:53,335 CustomTThreadPoolServer.java (line 104) Transport error occurred during acceptance of message. org.apache.thrift.transport.TTransportException: java.net.SocketException: Too many open files at org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:124) at org.apache.cassandra.thrift.TCustomServerSocket.acceptImpl(TCustomServerSocket.java:67) at org.apache.cassandra.thrift.TCustomServerSocket.acceptImpl(TCustomServerSocket.java:38) at org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:31) at org.apache.cassandra.thrift.CustomTThreadPoolServer.serve(CustomTThreadPoolServer.java:98) at org.apache.cassandra.thrift.CassandraDaemon.start(CassandraDaemon.java:120) at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:229) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:134) Caused by: java.net.SocketException: Too many open files at java.net.PlainSocketImpl.socketAccept(Native Method) at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:384) at java.net.ServerSocket.implAccept(ServerSocket.java:453) at java.net.ServerSocket.accept(ServerSocket.java:421) at org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:119) When I try to restart Cassandra, I have the following exception : ERROR 16:42:26,573 Exception encountered during startup. java.lang.ArithmeticException: / by zero at org.apache.cassandra.io.sstable.SSTable.estimateRowsFromIndex(SSTable.java:233) at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:284) at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:200) at org.apache.cassandra.db.ColumnFamilyStore.init(ColumnFamilyStore.java:225) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:449) at org.apache.cassandra.db.ColumnFamilyStore.addIndex(ColumnFamilyStore.java:306) at org.apache.cassandra.db.ColumnFamilyStore.init(ColumnFamilyStore.java:246) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:449) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:437) at org.apache.cassandra.db.Table.initCf(Table.java:341) at org.apache.cassandra.db.Table.init(Table.java:283) at org.apache.cassandra.db.Table.open(Table.java:114) at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:138) at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:55) at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:216) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:134) I am looking for advice on how to debug this. Thanks, -- Amin -- Amin -- //GK german.kond...@gmail.com // sites http://twitter.com/germanklf http://www.facebook.com/germanklf http://ar.linkedin.com/in/germankondolf
Too many open files Exception + java.lang.ArithmeticException: / by zero
*Hello,* *I'm using cassandra 0.7.0 rc1, a single node configuration, replication factor 1, random partitioner, 2 GO heap size.* *I ran my hector client to insert 5.000.000 rows but after a couple of hours, the following Exception occurs : * WARN [main] 2010-12-15 16:38:53,335 CustomTThreadPoolServer.java (line 104) Transport error occurred during acceptance of message. org.apache.thrift.transport.TTransportException: java.net.SocketException: Too many open files at org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:124) at org.apache.cassandra.thrift.TCustomServerSocket.acceptImpl(TCustomServerSocket.java:67) at org.apache.cassandra.thrift.TCustomServerSocket.acceptImpl(TCustomServerSocket.java:38) at org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:31) at org.apache.cassandra.thrift.CustomTThreadPoolServer.serve(CustomTThreadPoolServer.java:98) at org.apache.cassandra.thrift.CassandraDaemon.start(CassandraDaemon.java:120) at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:229) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:134) Caused by: java.net.SocketException: Too many open files at java.net.PlainSocketImpl.socketAccept(Native Method) at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:384) at java.net.ServerSocket.implAccept(ServerSocket.java:453) at java.net.ServerSocket.accept(ServerSocket.java:421) at org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:119) *When I try to restart Cassandra, I have the following exception :* ERROR 16:42:26,573 Exception encountered during startup. java.lang.ArithmeticException: / by zero at org.apache.cassandra.io.sstable.SSTable.estimateRowsFromIndex(SSTable.java:233) at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:284) at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:200) at org.apache.cassandra.db.ColumnFamilyStore.init(ColumnFamilyStore.java:225) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:449) at org.apache.cassandra.db.ColumnFamilyStore.addIndex(ColumnFamilyStore.java:306) at org.apache.cassandra.db.ColumnFamilyStore.init(ColumnFamilyStore.java:246) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:449) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:437) at org.apache.cassandra.db.Table.initCf(Table.java:341) at org.apache.cassandra.db.Table.init(Table.java:283) at org.apache.cassandra.db.Table.open(Table.java:114) at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:138) at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:55) at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:216) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:134) *I am looking for advice on how to debug this, any ideas please? Thanks, * -- Amin
Re: Too many open files Exception + java.lang.ArithmeticException: / by zero
http://www.riptano.com/docs/0.6/troubleshooting/index#java-reports-an-error-saying-there-are-too-many-open-files On Wed, Dec 15, 2010 at 11:13 AM, Amin Sakka, Novapost amin.sa...@novapost.fr wrote: *Hello,* *I'm using cassandra 0.7.0 rc1, a single node configuration, replication factor 1, random partitioner, 2 GO heap size.* *I ran my hector client to insert 5.000.000 rows but after a couple of hours, the following Exception occurs : * WARN [main] 2010-12-15 16:38:53,335 CustomTThreadPoolServer.java (line 104) Transport error occurred during acceptance of message. org.apache.thrift.transport.TTransportException: java.net.SocketException: Too many open files at org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:124) at org.apache.cassandra.thrift.TCustomServerSocket.acceptImpl(TCustomServerSocket.java:67) at org.apache.cassandra.thrift.TCustomServerSocket.acceptImpl(TCustomServerSocket.java:38) at org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:31) at org.apache.cassandra.thrift.CustomTThreadPoolServer.serve(CustomTThreadPoolServer.java:98) at org.apache.cassandra.thrift.CassandraDaemon.start(CassandraDaemon.java:120) at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:229) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:134) Caused by: java.net.SocketException: Too many open files at java.net.PlainSocketImpl.socketAccept(Native Method) at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:384) at java.net.ServerSocket.implAccept(ServerSocket.java:453) at java.net.ServerSocket.accept(ServerSocket.java:421) at org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:119) *When I try to restart Cassandra, I have the following exception :* ERROR 16:42:26,573 Exception encountered during startup. java.lang.ArithmeticException: / by zero at org.apache.cassandra.io.sstable.SSTable.estimateRowsFromIndex(SSTable.java:233) at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:284) at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:200) at org.apache.cassandra.db.ColumnFamilyStore.init(ColumnFamilyStore.java:225) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:449) at org.apache.cassandra.db.ColumnFamilyStore.addIndex(ColumnFamilyStore.java:306) at org.apache.cassandra.db.ColumnFamilyStore.init(ColumnFamilyStore.java:246) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:449) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:437) at org.apache.cassandra.db.Table.initCf(Table.java:341) at org.apache.cassandra.db.Table.init(Table.java:283) at org.apache.cassandra.db.Table.open(Table.java:114) at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:138) at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:55) at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:216) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:134) I am looking for advice on how to debug this. Thanks, -- Amin
Too many open files Exception + java.lang.ArithmeticException: / by zero
*Hello,* *I'm using cassandra 0.7.0 rc1, a single node configuration, replication factor 1, random partitioner, 2 GO heap size.* *I ran my hector client to insert 5.000.000 rows but after a couple of hours, the following Exception occurs : * WARN [main] 2010-12-15 16:38:53,335 CustomTThreadPoolServer.java (line 104) Transport error occurred during acceptance of message. org.apache.thrift.transport.TTransportException: java.net.SocketException: Too many open files at org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:124) at org.apache.cassandra.thrift.TCustomServerSocket.acceptImpl(TCustomServerSocket.java:67) at org.apache.cassandra.thrift.TCustomServerSocket.acceptImpl(TCustomServerSocket.java:38) at org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:31) at org.apache.cassandra.thrift.CustomTThreadPoolServer.serve(CustomTThreadPoolServer.java:98) at org.apache.cassandra.thrift.CassandraDaemon.start(CassandraDaemon.java:120) at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:229) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:134) Caused by: java.net.SocketException: Too many open files at java.net.PlainSocketImpl.socketAccept(Native Method) at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:384) at java.net.ServerSocket.implAccept(ServerSocket.java:453) at java.net.ServerSocket.accept(ServerSocket.java:421) at org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:119) *When I try to restart Cassandra, I have the following exception :* ERROR 16:42:26,573 Exception encountered during startup. java.lang.ArithmeticException: / by zero at org.apache.cassandra.io.sstable.SSTable.estimateRowsFromIndex(SSTable.java:233) at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:284) at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:200) at org.apache.cassandra.db.ColumnFamilyStore.init(ColumnFamilyStore.java:225) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:449) at org.apache.cassandra.db.ColumnFamilyStore.addIndex(ColumnFamilyStore.java:306) at org.apache.cassandra.db.ColumnFamilyStore.init(ColumnFamilyStore.java:246) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:449) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:437) at org.apache.cassandra.db.Table.initCf(Table.java:341) at org.apache.cassandra.db.Table.init(Table.java:283) at org.apache.cassandra.db.Table.open(Table.java:114) at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:138) at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:55) at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:216) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:134) I am looking for advice on how to debug this. Thanks, -- Amin
Re: too many open files 0.7.0 beta1
That looks like it. I've pushed the limits up to 65k and turned down the testing for now. Otherwise machines were dropping like flies.Thanks.AaronOn 26 Aug, 2010,at 04:16 PM, Dan Washusen d...@reactive.org wrote:Maybe you're seeing this: https://issues.apache.org/jira/browse/CASSANDRA-1416On Thu, Aug 26, 2010 at 2:05 PM, Aaron Morton aa...@thelastpickle.com wrote: Under 0.7.0 beta1 am seeing cassandra run out of files handles...Caused by: java.io.FileNotFoundException: /local1/junkbox/cassandra/data/junkbox.wetafx.co.nz/ObjectIndex-e-31-Index.db (Too many open files) at java.ioRandomAccessFile.open(Native Method) at java.io.RandomAccessFile.init(RandomAccessFile.java:212) at java.io.RandomAccessFile.init(RandomAccessFile.java:98) at org.apache.cassandra.io.util.BufferedRandomAccessFile.init(BufferedRandomAccessFile.java:142)If I look at the filedescriptorsfor the process I can see it already has 1,958 for to the file sudo ls -l /proc/20862/fd | grep "ObjectIndex-e-31-Data.db" | wc -l1958Out of a total of 2044.Other nodes in the cluster have a similar number of fd's - around 2k with the majority to one SSTable. I did not experience this under 0.6 so just checking if this sounds OK and I should just increase the number of handles or if it's a bug?ThanksAaron
too many open files 0.7.0 beta1
Under 0.7.0 beta1 am seeing cassandra run out of files handles...Caused by: java.io.FileNotFoundException: /local1/junkbox/cassandra/data/junkbox.wetafx.co.nz/ObjectIndex-e-31-Index.db (Too many open files) at java.ioRandomAccessFile.open(Native Method) at java.io.RandomAccessFile.init(RandomAccessFile.java:212) at java.io.RandomAccessFile.init(RandomAccessFile.java:98) at org.apache.cassandra.io.util.BufferedRandomAccessFile.init(BufferedRandomAccessFile.java:142)If I look at the filedescriptorsfor the process I can see it already has 1,958 for to the filesudo ls -l /proc/20862/fd | grep "ObjectIndex-e-31-Data.db" | wc -l1958Out of a total of 2044.Other nodes in the cluster have a similar number of fd's - around 2k with the majority to one SSTable.I did not experience this under 0.6 so just checking if this sounds OK and I should just increase the number of handles or if it's a bug?ThanksAaron
Re: too many open files 0.7.0 beta1
Maybe you're seeing this: https://issues.apache.org/jira/browse/CASSANDRA-1416 On Thu, Aug 26, 2010 at 2:05 PM, Aaron Morton aa...@thelastpickle.comwrote: Under 0.7.0 beta1 am seeing cassandra run out of files handles... Caused by: java.io.FileNotFoundException: /local1/junkbox/cassandra/data/ junkbox.wetafx.co.nz/ObjectIndex-e-31-Index.db (Too many open files) at java.ioRandomAccessFile.open(Native Method) at java.io.RandomAccessFile.init(RandomAccessFile.java:212) at java.io.RandomAccessFile.init(RandomAccessFile.java:98) at org.apache.cassandra.io.util.BufferedRandomAccessFile.init(BufferedRandomAccessFile.java:142) If I look at the file descriptors for the process I can see it already has 1,958 for to the file sudo ls -l /proc/20862/fd | grep ObjectIndex-e-31-Data.db | wc -l 1958 Out of a total of 2044. Other nodes in the cluster have a similar number of fd's - around 2k with the majority to one SSTable. I did not experience this under 0.6 so just checking if this sounds OK and I should just increase the number of handles or if it's a bug? Thanks Aaron
Re: Too many open files [was Re: Minimizing the impact of compaction on latency and throughput]
socketexception means this is coming from the network, not the sstables knowing the full error message would be nice, but just about any problem on that end should be fixed by adding connection pooling to your client. (moving to user@) On Wed, Jul 14, 2010 at 5:09 AM, Thomas Downing tdown...@proteus-technologies.com wrote: On 7/13/2010 9:20 AM, Jonathan Ellis wrote: On Tue, Jul 13, 2010 at 4:19 AM, Thomas Downing tdown...@proteus-technologies.com wrote: On a related note: I am running some feasibility tests looking for high ingest rate capabilities. While testing Cassandra the problem I've encountered is that it runs out of file handles during compaction. This usually just means increase the allowed fh via ulimit/etc. Increasing the memtable thresholds so that you create less sstables, but larger ones, is also a good idea. The defaults are small so Cassandra can work on a 1GB heap which is much smaller than most production ones. Reasonable rule of thumb: if you have a heap of N GB, increase both the throughput and count thresholds by N times. Thanks for the suggestion. I gave it a whirl, but no go. The file handles in in use stayed at around 500 for the first 30M or so mutates, then within 4 seconds they jumped to about 800, stayed there for about 30 seconds, then within 5 seconds went over 2022, at which point the server entered the cycle of SocketException: Too many open files. Interesting thing is that the file limit for this process is 32768. Note the numbers below as well. If there is anything specific you would like me to try, let me know. Seems like there's some sort of non-linear behavior here. This behavior is the same as before I multiplied the Cassandra params by 4 (number of G); which leads me to think that increasing limits, whether files or Cassandra parameters is likely to be a tail-chasing excercise. This causes time-out exceptions at the client. On this exception, my client closes the connection, waits a bit, then retries. After a few hours of this the server still had not recovered. I killed the clients, and watched the server after that. The file handles open dropped by 8, and have stayed there. The server is, of course, not throwing SocketException any more. On the other hand, the server is not doing any thing at all. When there is no client activity, and the server is idle, there are 155 threads running in the JVM. The all are in one of three states, almost all blocked at futex( ), a few blocked at accept( ) , a few cycling over timeout on futex(), gettimeofday(), futex() ... None are blocked at IO. I can't attach a debugger, I get IO exceptions trying either socket or local connections, no surprise, so I don't know of a way to get the Java code where the threads are blocking. More than one fd can be open on a given file, and many of open fd's are on files that have been deleted. The stale fd's are all on Data.db files in the data directory, which I have separate from the commit log directory. I haven't had a chance to look at the code handling files, and I am not any sort of Java expert, but could this be due to Java's lazy resource clean up? I wonder if when considering writing your own file handling classes for O_DIRECT or posix_fadvise or whatever, an explicit close(2) might help. A restart of the client causes immediate SocketExceptions at the server and timeouts at the client. I noted on the restart that the open fd's jumped by 32, despite only making 4 connections. At this point, there were 2028 open files - more than there where when the exceptions began at 2002 open files. So it seems like the exception is not caused by the OS returning EMFILE - unless it was returning EMFILE for some strange reason, and the bump in open files is due to an increase in duplicate open files. (BTW, it's not ENFILE!). I also noted that although the TimeoutExceptions did not occur immediately on the client, the SocketExceptions began immediately on the server. This does not seem to match up. I am using the org.apache.cassandra.thrift API directly, not any higher level wrapper. Finally, this jump to 2028 on the restart caused a new symptom. I only had the client running a few seconds, but after 15 minutes, the server is still throwing exceptions, even though the open file handles immediately dropped from 2028 down to 1967. Thanks for your attention, and all your work, Thomas Downing -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: Too many open files [was Re: Minimizing the impact of compaction on latency and throughput]
[snip] I'm not sure that is the case. When the server gets into the unrecoverable state, the repeating exceptions are indeed SocketException: Too many open files. [snip] Although this is unquestionably a network error, I don't think it is actually a network problem per se, as the maximum number of sockets open by the Cassandra server is at this point is about 8. When I kill the client, sockets held are just the listening sockets - no sockets in ESTABLISHED or TIMED_WAIT. Is this based on netstat or lsof or similar? When the node is in the state of giving these errors, try inspecting /proc/pid/fd or use lsof. Presumably you'll see thousands of fds of some category; either sockets or files. (If you already did this, sorry!) -- / Peter Schuller
Re: Too many open files [was Re: Minimizing the impact of compaction on latency and throughput]
Thomas, I had a similar problem a few weeks back. I changed my code to make sure that each thread only creates and uses one Hector connection. It seems that client sockets are not being released properly, but I didn't have the time to dig into it. Jorge On Wed, Jul 14, 2010 at 8:28 AM, Peter Schuller peter.schul...@infidyne.com wrote: [snip] I'm not sure that is the case. When the server gets into the unrecoverable state, the repeating exceptions are indeed SocketException: Too many open files. [snip] Although this is unquestionably a network error, I don't think it is actually a network problem per se, as the maximum number of sockets open by the Cassandra server is at this point is about 8. When I kill the client, sockets held are just the listening sockets - no sockets in ESTABLISHED or TIMED_WAIT. Is this based on netstat or lsof or similar? When the node is in the state of giving these errors, try inspecting /proc/pid/fd or use lsof. Presumably you'll see thousands of fds of some category; either sockets or files. (If you already did this, sorry!) -- / Peter Schuller
Re: Too many open files [was Re: Minimizing the impact of compaction on latency and throughput]
Each of my top-level functions was allocating a Hector client connection at the top, and releasing it when returning. The problem arose when a top-level function had to call another top-level function, which led to the same thread allocating two connections. Hector was not releasing one of them even though I was explicitly requesting them to be released. This might have been fixed since then, and like I said, I didn't dig into why it was happening. I just made sure to pass along the connection instances as necessary and the problem went away. On Wed, Jul 14, 2010 at 11:40 AM, shimi shim...@gmail.com wrote: do you mean that you don't release the connection back to fhe pool? On 2010 7 14 20:51, Jorge Barrios jo...@tapulous.com wrote: Thomas, I had a similar problem a few weeks back. I changed my code to make sure that each thread only creates and uses one Hector connection. It seems that client sockets are not being released properly, but I didn't have the time to dig into it. Jorge On Wed, Jul 14, 2010 at 8:28 AM, Peter Schuller peter.schul...@infidyne.com wrote: [snip] ...