According to the timestamps (see original post) the SSTable was written (thus compacted compacted) 3 days after all columns for that row had expired and 6 days after the row was created; yet all columns are still showing up in the SSTable. Note that the column shows now rows when a "get" for that key is run so that's working correctly, but the data is lugged around far longer than it should be -- maybe forever.
-Bryan On Wed, Jan 16, 2013 at 5:44 PM, Andrey Ilinykh <ailin...@gmail.com> wrote: > To get column removed you have to meet two requirements > 1. column should be expired > 2. after that CF gets compacted > > I guess your expired columns are propagated to high tier CF, which gets > compacted rarely. > So, you have to wait when high tier CF gets compacted. > > Andrey > > > > On Wed, Jan 16, 2013 at 11:39 AM, Bryan Talbot <btal...@aeriagames.com>wrote: > >> On cassandra 1.1.5 with a write heavy workload, we're having problems >> getting rows to be compacted away (removed) even though all columns have >> expired TTL. We've tried size tiered and now leveled and are seeing the >> same symptom: the data stays around essentially forever. >> >> Currently we write all columns with a TTL of 72 hours (259200 seconds) >> and expect to add 10 GB of data to this CF per day per node. Each node >> currently has 73 GB for the affected CF and shows no indications that old >> rows will be removed on their own. >> >> Why aren't rows being removed? Below is some data from a sample row >> which should have been removed several days ago but is still around even >> though it has been involved in numerous compactions since being expired. >> >> $> ./bin/nodetool -h localhost getsstables metrics request_summary >> 459fb460-5ace-11e2-9b92-11d67b6163b4 >> >> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db >> >> $> ls -alF >> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db >> -rw-rw-r-- 1 sandra sandra 5252320 Jan 16 08:42 >> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db >> >> $> ./bin/sstable2json >> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db >> -k $(echo -n 459fb460-5ace-11e2-9b92-11d67b6163b4 | hexdump -e '36/1 "%x"') >> { >> "34353966623436302d356163652d313165322d396239322d313164363762363136336234": >> [["app_name","50f21d3d",1357785277207001,"d"], >> ["client_ip","50f21d3d",1357785277207001,"d"], >> ["client_req_id","50f21d3d",1357785277207001,"d"], >> ["mysql_call_cnt","50f21d3d",1357785277207001,"d"], >> ["mysql_duration_us","50f21d3d",1357785277207001,"d"], >> ["mysql_failure_call_cnt","50f21d3d",1357785277207001,"d"], >> ["mysql_success_call_cnt","50f21d3d",1357785277207001,"d"], >> ["req_duration_us","50f21d3d",1357785277207001,"d"], >> ["req_finish_time_us","50f21d3d",1357785277207001,"d"], >> ["req_method","50f21d3d",1357785277207001,"d"], >> ["req_service","50f21d3d",1357785277207001,"d"], >> ["req_start_time_us","50f21d3d",1357785277207001,"d"], >> ["success","50f21d3d",1357785277207001,"d"]] >> } >> >> >> Decoding the column timestamps to shows that the columns were written at >> "Thu, 10 Jan 2013 02:34:37 GMT" and that their TTL expired at "Sun, 13 Jan >> 2013 02:34:37 GMT". The date of the SSTable shows that it was generated on >> Jan 16 which is 3 days after all columns have TTL-ed out. >> >> >> The schema shows that gc_grace is set to 0 since this data is write-once, >> read-seldom and is never updated or deleted. >> >> create column family request_summary >> with column_type = 'Standard' >> and comparator = 'UTF8Type' >> and default_validation_class = 'UTF8Type' >> and key_validation_class = 'UTF8Type' >> and read_repair_chance = 0.1 >> and dclocal_read_repair_chance = 0.0 >> and gc_grace = 0 >> and min_compaction_threshold = 4 >> and max_compaction_threshold = 32 >> and replicate_on_write = true >> and compaction_strategy = >> 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy' >> and caching = 'NONE' >> and bloom_filter_fp_chance = 1.0 >> and compression_options = {'chunk_length_kb' : '64', >> 'sstable_compression' : >> 'org.apache.cassandra.io.compress.SnappyCompressor'}; >> >> >> Thanks in advance for help in understanding why rows such as this are not >> removed! >> >> -Bryan >> >> >