We have seen the issue while using LCS that there were around 100K stables
got generated and compactions were not able to catch up and node became
unresponsive. The reason for that was one of the stable got corrupted and
compaction was kind of hanging on that sstable and further sstables were
flushing . After running nodetool scrub all went well.
So corruption might also be the reason. After running nodetool scrub you
will see message in system.log if any of the sstable is corrupted and
nodetool scrub will also clean the corruption.

On 25 October 2016 at 11:30, Edward Capriolo <edlinuxg...@gmail.com> wrote:

> I have not read the entire thread so sorry if this is already mentioned.
> You should review your logs, a potential problem could be a corrupted
> sstable.
>
> In a situation like this you will notice that the system is repeatedly
> trying to compact a given sstable. The compaction fails and based on the
> heuristics it may successfully compact some other files, but ultimately
> each time it attempts to do a compaction involving this sstable the process
> fails and the number of files keeps growing.
>
> Good luck,
> Edward
>
> On Tue, Oct 25, 2016 at 10:31 AM, DuyHai Doan <doanduy...@gmail.com>
> wrote:
>
>> what are your disk hardware specs ?
>>
>> On Tue, Oct 25, 2016 at 8:47 AM, Lahiru Gamathige <lah...@highfive.com>
>> wrote:
>>
>>> Hi Users,
>>>
>>> I have a single server code deployed with multiple environments
>>> (staging, dev etc) but they all use a single Cassandra cluster but
>>> keyspaces are prefixed with the environment name, so each server has its
>>> own keyspace to store data. I am using Cassandra 2.1.0 and using it to
>>> store timeseries data.
>>>
>>> I see thousands of SSTables in only one node for one environment and
>>> that node is running out of memory because of this (I am guessing thats the
>>> cause because I see lots of logs trying to compact that data). All the
>>> other nodes which use other environments too are just works fine but this
>>> not with one environment keeps having this issue.
>>> Given that explanation I have two main questions.
>>>
>>> Anyone of you had the similar issue ? If so how did you solve it.
>>>
>>> If I want to clean only this keyspace from the full cluster what are the
>>> steps I should be doing ?
>>>
>>> Do you think if I shut down the cluster and delete the folder for the
>>> keyspace in all the nodes and restart the cluster would do the job ? Are
>>> there any other steps I need to follow ?
>>> (Reason I ask is if I just truncate from cql still data will be there
>>> and there's seriously something wrong in that table and I'm not sure it
>>> will ever get cleaned up)
>>>
>>> Thanks
>>> Lahiru
>>>
>>
>>
>

Reply via email to