Do you have any snapshots on the nodes where you are seeing this issue?
Snapshots will link to sstables which will cause them not be deleted.

-Arindam

From: Narendra Sharma [mailto:narendra.sha...@gmail.com]
Sent: Sunday, December 15, 2013 1:15 PM
To: user@cassandra.apache.org
Subject: Cassandra 1.1.6 - Disk usage and Load displayed in ring doesn't match

We have 8 node cluster. Replication factor is 3.

For some of the nodes the Disk usage (du -ksh .) in the data directory for CF 
doesn't match the Load reported in nodetool ring command. When we expanded the 
cluster from 4 node to 8 nodes (4 weeks back), everything was okay. Over period 
of last 2-3 weeks the disk usage has gone up. We increased the RF from 2 to 3 2 
weeks ago.

I am not sure if increasing the RF is causing this issue.

For one of the nodes that I analyzed:
1. nodetool ring reported load as 575.38 GB

2. nodetool cfstats for the CF reported:
SSTable count: 28
Space used (live): 572671381955
Space used (total): 572671381955


3. 'ls -1 *Data* | wc -l' in the data folder for CF returned
46

4. 'du -ksh .' in the data folder for CF returned
720G

The above numbers indicate that there are some sstables that are obsolete and 
are still occupying space on disk. What could be wrong? Will restarting the 
node help? The cassandra process is running for last 45 days with no downtime. 
However, because the disk usage is high, we are not able to run full compaction.

Also, I can't find reference to each of the sstables on disk in the system.log 
file. For eg I have one data file on disk as (ls -lth):
86G Nov 20 06:14

I have system.log file with first line:
INFO [main] 2013-11-18 09:41:56,120 AbstractCassandraDaemon.java (line 101) 
Logging initialized

The 86G file must be a result of some compaction. I see no reference of data 
file in system.log file between 11/18 to 11/25. What could be the reason for 
that? The only reference is dated 11/29 when the file was being streamed to 
another node (new node).

How can I identify the obsolete files and remove them? I am thinking about 
following. Let me know if it make sense.
1. Restart the node and check the state.
2. Move the oldest data files to another location (to another mount point)
3. Restart the node again
4. Run repair on the node so that it can get the missing data from its peers.


I compared the numbers of a healthy node for the same CF:
1. nodetool ring reported load as 662.95 GB

2. nodetool cfstats for the CF reported:
SSTable count: 16
Space used (live): 670524321067
Space used (total): 670524321067

3. 'ls -1 *Data* | wc -l' in the data folder for CF returned
16

4. 'du -ksh .' in the data folder for CF returned
625G


-Naren



--
Narendra Sharma
Software Engineer
http://www.aeris.com
http://narendrasharma.blogspot.com/

Reply via email to