This kind of recovery is definitely not my strong point, so feedback on
this approach would certainly be welcome.

As I understand it, if you really want to keep that data, you ought to be
able to mv it out of the way to get your node online, then move those files
in a several thousand at a time, nodetool refresh OpsCenter rollups60 &&
nodetool compact OpsCenter rollups60; rinse and repeat.  This should let
you incrementally restore the data in that keyspace without putting so many
sstables in there that it ooms your cluster again.

On Tue, Feb 10, 2015 at 3:38 PM, Chris Lohfink <clohfin...@gmail.com> wrote:

> yeah... probably just 2.1.2 things and not compactions.  Still probably
> want to do something about the 1.6 million files though.  It may be worth
> just mv/rm'ing to 60 sec rollup data though unless really attached to it.
>
> Chris
>
> On Tue, Feb 10, 2015 at 4:04 PM, Paul Nickerson <pgn...@gmail.com> wrote:
>
>> I was having trouble with snapshots failing while trying to repair that
>> table (
>> http://www.mail-archive.com/user@cassandra.apache.org/msg40686.html). I
>> have a repair running on it now, and it seems to be going successfully this
>> time. I am going to wait for that to finish, then try a manual nodetool
>> compact. If that goes successfully, then would it be safe to chalk the lack
>> of compaction on this table in the past up to 2.1.2 problems?
>>
>>
>>  ~ Paul Nickerson
>>
>> On Tue, Feb 10, 2015 at 3:34 PM, Chris Lohfink <clohfin...@gmail.com>
>> wrote:
>>
>>> Your cluster is probably having issues with compactions (with STCS you
>>> should never have this many).  I would probably punt with
>>> OpsCenter/rollups60. Turn the node off and move all of the sstables off to
>>> a different directory for backup (or just rm if you really don't care about
>>> 1 minute metrics), than turn the server back on.
>>>
>>> Once you get your cluster running again go back and investigate why
>>> compactions stopped, my guess is you hit an exception in past that killed
>>> your CompactionExecutor and things just built up slowly until you got to
>>> this point.
>>>
>>> Chris
>>>
>>> On Tue, Feb 10, 2015 at 2:15 PM, Paul Nickerson <pgn...@gmail.com>
>>> wrote:
>>>
>>>> Thank you Rob. I tried a 12 GiB heap size, and still crashed out. There
>>>> are 1,617,289 files under OpsCenter/rollups60.
>>>>
>>>> Once I downgraded Cassandra to 2.1.1 (apt-get install cassandra=2.1.1),
>>>> I was able to start up Cassandra OK with the default heap size formula.
>>>>
>>>> Now my cluster is running multiple versions of Cassandra. I think I
>>>> will downgrade the rest to 2.1.1.
>>>>
>>>>  ~ Paul Nickerson
>>>>
>>>> On Tue, Feb 10, 2015 at 2:05 PM, Robert Coli <rc...@eventbrite.com>
>>>> wrote:
>>>>
>>>>> On Tue, Feb 10, 2015 at 11:02 AM, Paul Nickerson <pgn...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I am getting an out of memory error why I try to start Cassandra on
>>>>>> one of my nodes. Cassandra will run for a minute, and then exit without
>>>>>> outputting any error in the log file. It is happening while SSTableReader
>>>>>> is opening a couple hundred thousand things.
>>>>>>
>>>>> ...
>>>>>
>>>>>> Does anyone know how I might get Cassandra on this node running
>>>>>> again? I'm not very familiar with correctly tuning Java memory 
>>>>>> parameters,
>>>>>> and I'm not sure if that's the right solution in this case anyway.
>>>>>>
>>>>>
>>>>> Try running 2.1.1, and/or increasing heap size beyond 8gb.
>>>>>
>>>>> Are there actually that many SSTables on disk?
>>>>>
>>>>> =Rob
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>

Reply via email to