Thanks for the insights Jeff! I did go through the tickets around dropping expired sstables that have overlaps - based on what I understand, the only undesirable impact of that would be possible data resurrection.
I have now attached the output of sstableslicer with the mail. Will submit a patch for review. Thanks, Sumanth On Tue, Aug 8, 2017 at 9:49 PM, Jeff Jirsa <jji...@gmail.com> wrote: > The most likely cause is read repairs due to consistency level repairs > (digest mismatch). The only way to actually eliminate read repair is to > read with CL:ONE, which almost nobody does (at least in time series use > cases, because it implies you probably write with ALL, or run repair which > - as you've noted - often isn't necessary in ttl-only use cases). > > I can't see the image, but more tools for understanding sstable state are > never a bad thing (as long as they're generally useful and maintainable). > > For what it's worth, there are tickets in flight for being more aggressive > at dropping overlaps, but there are companies that use tools that stop the > cluster, use sstablemetadata to identify sstables we knew should be fully > expired, and manually remove them (/bin/rm) before starting cassandra > again. It works reasonably well IF (and only if) you write all data with > TTLs, and you can identify fully expired sstables based on maximum > timestamps. > > > > > On Tue, Aug 8, 2017 at 8:51 PM, Sumanth Pasupuleti < > sumanth.pasupuleti...@gmail.com> wrote: > > > Hi, > >> > >> We use TWCS in a few of the column families that have TTL based > >> time-series data, and no explicit deletes are issued. Over the time, we > >> observed the disk usage has been increasing beyond the expected levels. > >> > >> Data directory in a particular node shows SSTables that are more than > >> 16days old, while the bucket size is configured at 12hours, TTL is at > >> 15days and GC grace at 1hour. > >> Upon using sstableexpiredblockers, we got quite a few sets of blocking > >> and blocked SSTables. SSTableMetadata that is shown in the output > indicates > >> there is an overlap in the MinTS-MaxTS period among the blocking SSTable > >> and the blocked SSTables, which is preventing the older SSTables from > >> getting dropped/deleted. > >> > >> Following are the possible root causes we considered > >> > >> 1. Hints - old data hints getting replayed from the coordinator node. > >> We ruled this out since hints live for no more than 1 day based on > our > >> configuration. > >> 2. External compactions - no external compactions were run, that > >> could cause compaction of SSTables across the TWCS buckets. > >> 3. Read repairs - this is ruled out as well, since we never ran > >> external repairs, and read repair chance on the TWCS column families > has > >> been set to 0. > >> 4. Application team writing data with older timestamp (in newer > >> SSTables). > >> > >> > >> 1. We wanted to identify the specific row keys with older timestamps > >> in the blocking SSTable, that could be causing this issue to > occur. We > >> considered using SSTable2Keys/json, however, since both the tools > involve > >> outputting the entire content/keys of the SSTable in the order of > the keys, > >> they were not helpful in this case. > >> 2. Since we wanted to get data on a few oldest cells with > >> timestamps, we created a tool mostly based off of sstable2json, > called > >> sstableslicer, to output 'n' top/bottom cells in an SSTable, > ordered either > >> on writetime/localDeletionTime. This helped us identify the > specific cells > >> in new SSTables with older timestamps, which further helped in > debugging on > >> the application end. From application team perspective, however, > writing > >> data with old timestamp is not a possible scenario. > >> > >> 3. Below is a sample output of sstableslicer > > [image: Inline image 2] > > > > > >> Looking for suggestions, especially around following two things: > >> > >> 1. Did we miss any other case in TWCS that could be causing such > >> overlap? > >> 2. Does sstableslicer seem valuable, to be included in Apache C*? If > >> yes, I shall create a JIRA and submit a PR/patch for review. > >> > >> C* version we use is 2.1.17. > > > > Thanks, > >> Sumanth > >> > > >
--------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org