Re: Out of Memory Error While Opening SSTables on Startup

2015-03-20 Thread Jan
Paul Nickerson; 
curious, did you get a solution to your problem ? 
Regards,Jan/  



 On Tuesday, February 10, 2015 5:48 PM, Flavien Charlon 
flavien.char...@gmail.com wrote:
   

 I already experienced the same problem (hundreds of thousands of SSTables) 
with Cassandra 2.1.2. It seems to appear when running an incremental repair 
while there is a medium to high insert load on the cluster. The repair goes in 
a bad state and starts creating way more SSTables than it should (even when 
there should be nothing to repair).
On 10 February 2015 at 15:46, Eric Stevens migh...@gmail.com wrote:

This kind of recovery is definitely not my strong point, so feedback on this 
approach would certainly be welcome.
As I understand it, if you really want to keep that data, you ought to be able 
to mv it out of the way to get your node online, then move those files in a 
several thousand at a time, nodetool refresh OpsCenter rollups60  nodetool 
compact OpsCenter rollups60; rinse and repeat.  This should let you 
incrementally restore the data in that keyspace without putting so many 
sstables in there that it ooms your cluster again.
On Tue, Feb 10, 2015 at 3:38 PM, Chris Lohfink clohfin...@gmail.com wrote:

yeah... probably just 2.1.2 things and not compactions.  Still probably want to 
do something about the 1.6 million files though.  It may be worth just 
mv/rm'ing to 60 sec rollup data though unless really attached to it.
Chris
On Tue, Feb 10, 2015 at 4:04 PM, Paul Nickerson pgn...@gmail.com wrote:

I was having trouble with snapshots failing while trying to repair that table 
(http://www.mail-archive.com/user@cassandra.apache.org/msg40686.html). I have a 
repair running on it now, and it seems to be going successfully this time. I am 
going to wait for that to finish, then try a manual nodetool compact. If that 
goes successfully, then would it be safe to chalk the lack of compaction on 
this table in the past up to 2.1.2 problems?

 ~ Paul Nickerson
On Tue, Feb 10, 2015 at 3:34 PM, Chris Lohfink clohfin...@gmail.com wrote:

Your cluster is probably having issues with compactions (with STCS you should 
never have this many).  I would probably punt with OpsCenter/rollups60. Turn 
the node off and move all of the sstables off to a different directory for 
backup (or just rm if you really don't care about 1 minute metrics), than turn 
the server back on. 
Once you get your cluster running again go back and investigate why compactions 
stopped, my guess is you hit an exception in past that killed your 
CompactionExecutor and things just built up slowly until you got to this point.
Chris
On Tue, Feb 10, 2015 at 2:15 PM, Paul Nickerson pgn...@gmail.com wrote:

Thank you Rob. I tried a 12 GiB heap size, and still crashed out. There are 
1,617,289 files under OpsCenter/rollups60.
Once I downgraded Cassandra to 2.1.1 (apt-get install cassandra=2.1.1), I was 
able to start up Cassandra OK with the default heap size formula.
Now my cluster is running multiple versions of Cassandra. I think I will 
downgrade the rest to 2.1.1.
 ~ Paul Nickerson
On Tue, Feb 10, 2015 at 2:05 PM, Robert Coli rc...@eventbrite.com wrote:

On Tue, Feb 10, 2015 at 11:02 AM, Paul Nickerson pgn...@gmail.com wrote:

I am getting an out of memory error why I try to start Cassandra on one of my 
nodes. Cassandra will run for a minute, and then exit without outputting any 
error in the log file. It is happening while SSTableReader is opening a couple 
hundred thousand things.
... 
Does anyone know how I might get Cassandra on this node running again? I'm not 
very familiar with correctly tuning Java memory parameters, and I'm not sure if 
that's the right solution in this case anyway.

Try running 2.1.1, and/or increasing heap size beyond 8gb.
Are there actually that many SSTables on disk?
=Rob 













  

Re: Out of Memory Error While Opening SSTables on Startup

2015-02-10 Thread Robert Coli
On Tue, Feb 10, 2015 at 11:02 AM, Paul Nickerson pgn...@gmail.com wrote:

 I am getting an out of memory error why I try to start Cassandra on one of
 my nodes. Cassandra will run for a minute, and then exit without outputting
 any error in the log file. It is happening while SSTableReader is opening a
 couple hundred thousand things.

...

 Does anyone know how I might get Cassandra on this node running again? I'm
 not very familiar with correctly tuning Java memory parameters, and I'm not
 sure if that's the right solution in this case anyway.


Try running 2.1.1, and/or increasing heap size beyond 8gb.

Are there actually that many SSTables on disk?

=Rob


Re: Out of Memory Error While Opening SSTables on Startup

2015-02-10 Thread Chris Lohfink
Your cluster is probably having issues with compactions (with STCS you
should never have this many).  I would probably punt with
OpsCenter/rollups60. Turn the node off and move all of the sstables off to
a different directory for backup (or just rm if you really don't care about
1 minute metrics), than turn the server back on.

Once you get your cluster running again go back and investigate why
compactions stopped, my guess is you hit an exception in past that killed
your CompactionExecutor and things just built up slowly until you got to
this point.

Chris

On Tue, Feb 10, 2015 at 2:15 PM, Paul Nickerson pgn...@gmail.com wrote:

 Thank you Rob. I tried a 12 GiB heap size, and still crashed out. There
 are 1,617,289 files under OpsCenter/rollups60.

 Once I downgraded Cassandra to 2.1.1 (apt-get install cassandra=2.1.1), I
 was able to start up Cassandra OK with the default heap size formula.

 Now my cluster is running multiple versions of Cassandra. I think I will
 downgrade the rest to 2.1.1.

  ~ Paul Nickerson

 On Tue, Feb 10, 2015 at 2:05 PM, Robert Coli rc...@eventbrite.com wrote:

 On Tue, Feb 10, 2015 at 11:02 AM, Paul Nickerson pgn...@gmail.com
 wrote:

 I am getting an out of memory error why I try to start Cassandra on one
 of my nodes. Cassandra will run for a minute, and then exit without
 outputting any error in the log file. It is happening while SSTableReader
 is opening a couple hundred thousand things.

 ...

 Does anyone know how I might get Cassandra on this node running again?
 I'm not very familiar with correctly tuning Java memory parameters, and I'm
 not sure if that's the right solution in this case anyway.


 Try running 2.1.1, and/or increasing heap size beyond 8gb.

 Are there actually that many SSTables on disk?

 =Rob






Re: Out of Memory Error While Opening SSTables on Startup

2015-02-10 Thread Paul Nickerson
I was having trouble with snapshots failing while trying to repair that
table (http://www.mail-archive.com/user@cassandra.apache.org/msg40686.html).
I have a repair running on it now, and it seems to be going successfully
this time. I am going to wait for that to finish, then try a
manual nodetool compact. If that goes successfully, then would it be safe
to chalk the lack of compaction on this table in the past up to 2.1.2
problems?


 ~ Paul Nickerson

On Tue, Feb 10, 2015 at 3:34 PM, Chris Lohfink clohfin...@gmail.com wrote:

 Your cluster is probably having issues with compactions (with STCS you
 should never have this many).  I would probably punt with
 OpsCenter/rollups60. Turn the node off and move all of the sstables off to
 a different directory for backup (or just rm if you really don't care about
 1 minute metrics), than turn the server back on.

 Once you get your cluster running again go back and investigate why
 compactions stopped, my guess is you hit an exception in past that killed
 your CompactionExecutor and things just built up slowly until you got to
 this point.

 Chris

 On Tue, Feb 10, 2015 at 2:15 PM, Paul Nickerson pgn...@gmail.com wrote:

 Thank you Rob. I tried a 12 GiB heap size, and still crashed out. There
 are 1,617,289 files under OpsCenter/rollups60.

 Once I downgraded Cassandra to 2.1.1 (apt-get install cassandra=2.1.1), I
 was able to start up Cassandra OK with the default heap size formula.

 Now my cluster is running multiple versions of Cassandra. I think I will
 downgrade the rest to 2.1.1.

  ~ Paul Nickerson

 On Tue, Feb 10, 2015 at 2:05 PM, Robert Coli rc...@eventbrite.com
 wrote:

 On Tue, Feb 10, 2015 at 11:02 AM, Paul Nickerson pgn...@gmail.com
 wrote:

 I am getting an out of memory error why I try to start Cassandra on one
 of my nodes. Cassandra will run for a minute, and then exit without
 outputting any error in the log file. It is happening while SSTableReader
 is opening a couple hundred thousand things.

 ...

 Does anyone know how I might get Cassandra on this node running again?
 I'm not very familiar with correctly tuning Java memory parameters, and I'm
 not sure if that's the right solution in this case anyway.


 Try running 2.1.1, and/or increasing heap size beyond 8gb.

 Are there actually that many SSTables on disk?

 =Rob







Re: Out of Memory Error While Opening SSTables on Startup

2015-02-10 Thread Chris Lohfink
yeah... probably just 2.1.2 things and not compactions.  Still probably
want to do something about the 1.6 million files though.  It may be worth
just mv/rm'ing to 60 sec rollup data though unless really attached to it.

Chris

On Tue, Feb 10, 2015 at 4:04 PM, Paul Nickerson pgn...@gmail.com wrote:

 I was having trouble with snapshots failing while trying to repair that
 table (http://www.mail-archive.com/user@cassandra.apache.org/msg40686.html).
 I have a repair running on it now, and it seems to be going successfully
 this time. I am going to wait for that to finish, then try a
 manual nodetool compact. If that goes successfully, then would it be safe
 to chalk the lack of compaction on this table in the past up to 2.1.2
 problems?


  ~ Paul Nickerson

 On Tue, Feb 10, 2015 at 3:34 PM, Chris Lohfink clohfin...@gmail.com
 wrote:

 Your cluster is probably having issues with compactions (with STCS you
 should never have this many).  I would probably punt with
 OpsCenter/rollups60. Turn the node off and move all of the sstables off to
 a different directory for backup (or just rm if you really don't care about
 1 minute metrics), than turn the server back on.

 Once you get your cluster running again go back and investigate why
 compactions stopped, my guess is you hit an exception in past that killed
 your CompactionExecutor and things just built up slowly until you got to
 this point.

 Chris

 On Tue, Feb 10, 2015 at 2:15 PM, Paul Nickerson pgn...@gmail.com wrote:

 Thank you Rob. I tried a 12 GiB heap size, and still crashed out. There
 are 1,617,289 files under OpsCenter/rollups60.

 Once I downgraded Cassandra to 2.1.1 (apt-get install cassandra=2.1.1),
 I was able to start up Cassandra OK with the default heap size formula.

 Now my cluster is running multiple versions of Cassandra. I think I will
 downgrade the rest to 2.1.1.

  ~ Paul Nickerson

 On Tue, Feb 10, 2015 at 2:05 PM, Robert Coli rc...@eventbrite.com
 wrote:

 On Tue, Feb 10, 2015 at 11:02 AM, Paul Nickerson pgn...@gmail.com
 wrote:

 I am getting an out of memory error why I try to start Cassandra on
 one of my nodes. Cassandra will run for a minute, and then exit without
 outputting any error in the log file. It is happening while SSTableReader
 is opening a couple hundred thousand things.

 ...

 Does anyone know how I might get Cassandra on this node running again?
 I'm not very familiar with correctly tuning Java memory parameters, and 
 I'm
 not sure if that's the right solution in this case anyway.


 Try running 2.1.1, and/or increasing heap size beyond 8gb.

 Are there actually that many SSTables on disk?

 =Rob








Re: Out of Memory Error While Opening SSTables on Startup

2015-02-10 Thread Eric Stevens
This kind of recovery is definitely not my strong point, so feedback on
this approach would certainly be welcome.

As I understand it, if you really want to keep that data, you ought to be
able to mv it out of the way to get your node online, then move those files
in a several thousand at a time, nodetool refresh OpsCenter rollups60 
nodetool compact OpsCenter rollups60; rinse and repeat.  This should let
you incrementally restore the data in that keyspace without putting so many
sstables in there that it ooms your cluster again.

On Tue, Feb 10, 2015 at 3:38 PM, Chris Lohfink clohfin...@gmail.com wrote:

 yeah... probably just 2.1.2 things and not compactions.  Still probably
 want to do something about the 1.6 million files though.  It may be worth
 just mv/rm'ing to 60 sec rollup data though unless really attached to it.

 Chris

 On Tue, Feb 10, 2015 at 4:04 PM, Paul Nickerson pgn...@gmail.com wrote:

 I was having trouble with snapshots failing while trying to repair that
 table (
 http://www.mail-archive.com/user@cassandra.apache.org/msg40686.html). I
 have a repair running on it now, and it seems to be going successfully this
 time. I am going to wait for that to finish, then try a manual nodetool
 compact. If that goes successfully, then would it be safe to chalk the lack
 of compaction on this table in the past up to 2.1.2 problems?


  ~ Paul Nickerson

 On Tue, Feb 10, 2015 at 3:34 PM, Chris Lohfink clohfin...@gmail.com
 wrote:

 Your cluster is probably having issues with compactions (with STCS you
 should never have this many).  I would probably punt with
 OpsCenter/rollups60. Turn the node off and move all of the sstables off to
 a different directory for backup (or just rm if you really don't care about
 1 minute metrics), than turn the server back on.

 Once you get your cluster running again go back and investigate why
 compactions stopped, my guess is you hit an exception in past that killed
 your CompactionExecutor and things just built up slowly until you got to
 this point.

 Chris

 On Tue, Feb 10, 2015 at 2:15 PM, Paul Nickerson pgn...@gmail.com
 wrote:

 Thank you Rob. I tried a 12 GiB heap size, and still crashed out. There
 are 1,617,289 files under OpsCenter/rollups60.

 Once I downgraded Cassandra to 2.1.1 (apt-get install cassandra=2.1.1),
 I was able to start up Cassandra OK with the default heap size formula.

 Now my cluster is running multiple versions of Cassandra. I think I
 will downgrade the rest to 2.1.1.

  ~ Paul Nickerson

 On Tue, Feb 10, 2015 at 2:05 PM, Robert Coli rc...@eventbrite.com
 wrote:

 On Tue, Feb 10, 2015 at 11:02 AM, Paul Nickerson pgn...@gmail.com
 wrote:

 I am getting an out of memory error why I try to start Cassandra on
 one of my nodes. Cassandra will run for a minute, and then exit without
 outputting any error in the log file. It is happening while SSTableReader
 is opening a couple hundred thousand things.

 ...

 Does anyone know how I might get Cassandra on this node running
 again? I'm not very familiar with correctly tuning Java memory 
 parameters,
 and I'm not sure if that's the right solution in this case anyway.


 Try running 2.1.1, and/or increasing heap size beyond 8gb.

 Are there actually that many SSTables on disk?

 =Rob









Re: Out of Memory Error While Opening SSTables on Startup

2015-02-10 Thread Flavien Charlon
I already experienced the same problem (hundreds of thousands of SSTables)
with Cassandra 2.1.2. It seems to appear when running an incremental repair
while there is a medium to high insert load on the cluster. The repair goes
in a bad state and starts creating way more SSTables than it should (even
when there should be nothing to repair).

On 10 February 2015 at 15:46, Eric Stevens migh...@gmail.com wrote:

 This kind of recovery is definitely not my strong point, so feedback on
 this approach would certainly be welcome.

 As I understand it, if you really want to keep that data, you ought to be
 able to mv it out of the way to get your node online, then move those files
 in a several thousand at a time, nodetool refresh OpsCenter rollups60 
 nodetool compact OpsCenter rollups60; rinse and repeat.  This should let
 you incrementally restore the data in that keyspace without putting so many
 sstables in there that it ooms your cluster again.

 On Tue, Feb 10, 2015 at 3:38 PM, Chris Lohfink clohfin...@gmail.com
 wrote:

 yeah... probably just 2.1.2 things and not compactions.  Still probably
 want to do something about the 1.6 million files though.  It may be worth
 just mv/rm'ing to 60 sec rollup data though unless really attached to it.

 Chris

 On Tue, Feb 10, 2015 at 4:04 PM, Paul Nickerson pgn...@gmail.com wrote:

 I was having trouble with snapshots failing while trying to repair that
 table (
 http://www.mail-archive.com/user@cassandra.apache.org/msg40686.html). I
 have a repair running on it now, and it seems to be going successfully this
 time. I am going to wait for that to finish, then try a manual nodetool
 compact. If that goes successfully, then would it be safe to chalk the lack
 of compaction on this table in the past up to 2.1.2 problems?


  ~ Paul Nickerson

 On Tue, Feb 10, 2015 at 3:34 PM, Chris Lohfink clohfin...@gmail.com
 wrote:

 Your cluster is probably having issues with compactions (with STCS you
 should never have this many).  I would probably punt with
 OpsCenter/rollups60. Turn the node off and move all of the sstables off to
 a different directory for backup (or just rm if you really don't care about
 1 minute metrics), than turn the server back on.

 Once you get your cluster running again go back and investigate why
 compactions stopped, my guess is you hit an exception in past that killed
 your CompactionExecutor and things just built up slowly until you got to
 this point.

 Chris

 On Tue, Feb 10, 2015 at 2:15 PM, Paul Nickerson pgn...@gmail.com
 wrote:

 Thank you Rob. I tried a 12 GiB heap size, and still crashed out.
 There are 1,617,289 files under OpsCenter/rollups60.

 Once I downgraded Cassandra to 2.1.1 (apt-get install
 cassandra=2.1.1), I was able to start up Cassandra OK with the default 
 heap
 size formula.

 Now my cluster is running multiple versions of Cassandra. I think I
 will downgrade the rest to 2.1.1.

  ~ Paul Nickerson

 On Tue, Feb 10, 2015 at 2:05 PM, Robert Coli rc...@eventbrite.com
 wrote:

 On Tue, Feb 10, 2015 at 11:02 AM, Paul Nickerson pgn...@gmail.com
 wrote:

 I am getting an out of memory error why I try to start Cassandra on
 one of my nodes. Cassandra will run for a minute, and then exit without
 outputting any error in the log file. It is happening while 
 SSTableReader
 is opening a couple hundred thousand things.

 ...

 Does anyone know how I might get Cassandra on this node running
 again? I'm not very familiar with correctly tuning Java memory 
 parameters,
 and I'm not sure if that's the right solution in this case anyway.


 Try running 2.1.1, and/or increasing heap size beyond 8gb.

 Are there actually that many SSTables on disk?

 =Rob