Hi Mike,

It sounds like that record may have been deleted, if that is the case then it 
would still be shown in this sstable, but the deleted tombstone record would be 
in a later sstable. You can use nodetool getsstables to work out which sstables 
contain the data.

I recommend reading The Last Pickle post on this: 
http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html the sections towards 
the bottom of this post may well explain why the sstable is not being deleted.

Thanks 

Paul
www.redshots.com

> On 2 May 2019, at 16:08, Mike Torra <mto...@salesforce.com.INVALID> wrote:
> 
> I'm pretty stumped by this, so here is some more detail if it helps.
> 
> Here is what the suspicious partition looks like in the `sstabledump` output 
> (some pii etc redacted):
> ```
> {
>     "partition" : {
>       "key" : [ "some_user_id_value", "user_id", "demo-test" ],
>       "position" : 210
>     },
>     "rows" : [
>       {
>         "type" : "row",
>         "position" : 1132,
>         "clustering" : [ "2019-01-22 15:27:45.000Z" ],
>         "liveness_info" : { "tstamp" : "2019-01-22T15:31:12.415081Z" },
>         "cells" : [
>           { "some": "data" }
>         ]
>       }
>     ]
>   }
> ```
> 
> And here is what every other partition looks like:
> ```
> {
>     "partition" : {
>       "key" : [ "some_other_user_id", "user_id", "some_site_id" ],
>       "position" : 1133
>     },
>     "rows" : [
>       {
>         "type" : "row",
>         "position" : 1234,
>         "clustering" : [ "2019-01-22 17:59:35.547Z" ],
>         "liveness_info" : { "tstamp" : "2019-01-22T17:59:35.708Z", "ttl" : 
> 86400, "expires_at" : "2019-01-23T17:59:35Z", "expired" : true },
>         "cells" : [
>           { "name" : "activity_data", "deletion_info" : { "local_delete_time" 
> : "2019-01-22T17:59:35Z" }
>           }
>         ]
>       }
>     ]
>   }
> ```
> 
> As expected, almost all of the data except this one suspicious partition has 
> a ttl and is already expired. But if a partition isn't expired and I see it 
> in the sstable, why wouldn't I see it executing a CQL query against the CF? 
> Why would this sstable be preventing so many other sstable's from getting 
> cleaned up?
> 
> On Tue, Apr 30, 2019 at 12:34 PM Mike Torra <mto...@salesforce.com 
> <mailto:mto...@salesforce.com>> wrote:
> Hello -
> 
> I have a 48 node C* cluster spread across 4 AWS regions with RF=3. A few 
> months ago I started noticing disk usage on some nodes increasing 
> consistently. At first I solved the problem by destroying the nodes and 
> rebuilding them, but the problem returns.
> 
> I did some more investigation recently, and this is what I found:
> - I narrowed the problem down to a CF that uses TWCS, by simply looking at 
> disk space usage
> - in each region, 3 nodes have this problem of growing disk space (matches 
> replication factor)
> - on each node, I tracked down the problem to a particular SSTable using 
> `sstableexpiredblockers`
> - in the SSTable, using `sstabledump`, I found a row that does not have a ttl 
> like the other rows, and appears to be from someone else on the team testing 
> something and forgetting to include a ttl
> - all other rows show "expired: true" except this one, hence my suspicion
> - when I query for that particular partition key, I get no results
> - I tried deleting the row anyways, but that didn't seem to change anything
> - I also tried `nodetool scrub`, but that didn't help either
> 
> Would this rogue row without a ttl explain the problem? If so, why? If not, 
> does anyone have any other ideas? Why does the row show in `sstabledump` but 
> not when I query for it?
> 
> I appreciate any help or suggestions!
> 
> - Mike

Reply via email to