According to the timestamps (see original post) the SSTable was written
(thus compacted compacted) 3 days after all columns for that row had
expired and 6 days after the row was created; yet all columns are still
showing up in the SSTable.  Note that the column shows now rows when a
"get" for that key is run so that's working correctly, but the data is
lugged around far longer than it should be -- maybe forever.


-Bryan


On Wed, Jan 16, 2013 at 5:44 PM, Andrey Ilinykh <ailin...@gmail.com> wrote:

> To get column removed you have to meet two requirements
> 1. column should be expired
> 2. after that CF gets compacted
>
> I guess your expired columns are propagated to high tier CF, which gets
> compacted rarely.
> So, you have to wait when high tier CF gets compacted.
>
> Andrey
>
>
>
> On Wed, Jan 16, 2013 at 11:39 AM, Bryan Talbot <btal...@aeriagames.com>wrote:
>
>> On cassandra 1.1.5 with a write heavy workload, we're having problems
>> getting rows to be compacted away (removed) even though all columns have
>> expired TTL.  We've tried size tiered and now leveled and are seeing the
>> same symptom: the data stays around essentially forever.
>>
>> Currently we write all columns with a TTL of 72 hours (259200 seconds)
>> and expect to add 10 GB of data to this CF per day per node.  Each node
>> currently has 73 GB for the affected CF and shows no indications that old
>> rows will be removed on their own.
>>
>> Why aren't rows being removed?  Below is some data from a sample row
>> which should have been removed several days ago but is still around even
>> though it has been involved in numerous compactions since being expired.
>>
>> $> ./bin/nodetool -h localhost getsstables metrics request_summary
>> 459fb460-5ace-11e2-9b92-11d67b6163b4
>>
>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
>>
>> $> ls -alF
>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
>> -rw-rw-r-- 1 sandra sandra 5252320 Jan 16 08:42
>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
>>
>> $> ./bin/sstable2json
>> /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
>> -k $(echo -n 459fb460-5ace-11e2-9b92-11d67b6163b4 | hexdump  -e '36/1 "%x"')
>> {
>> "34353966623436302d356163652d313165322d396239322d313164363762363136336234":
>> [["app_name","50f21d3d",1357785277207001,"d"],
>> ["client_ip","50f21d3d",1357785277207001,"d"],
>> ["client_req_id","50f21d3d",1357785277207001,"d"],
>> ["mysql_call_cnt","50f21d3d",1357785277207001,"d"],
>> ["mysql_duration_us","50f21d3d",1357785277207001,"d"],
>> ["mysql_failure_call_cnt","50f21d3d",1357785277207001,"d"],
>> ["mysql_success_call_cnt","50f21d3d",1357785277207001,"d"],
>> ["req_duration_us","50f21d3d",1357785277207001,"d"],
>> ["req_finish_time_us","50f21d3d",1357785277207001,"d"],
>> ["req_method","50f21d3d",1357785277207001,"d"],
>> ["req_service","50f21d3d",1357785277207001,"d"],
>> ["req_start_time_us","50f21d3d",1357785277207001,"d"],
>> ["success","50f21d3d",1357785277207001,"d"]]
>> }
>>
>>
>> Decoding the column timestamps to shows that the columns were written at
>> "Thu, 10 Jan 2013 02:34:37 GMT" and that their TTL expired at "Sun, 13 Jan
>> 2013 02:34:37 GMT".  The date of the SSTable shows that it was generated on
>> Jan 16 which is 3 days after all columns have TTL-ed out.
>>
>>
>> The schema shows that gc_grace is set to 0 since this data is write-once,
>> read-seldom and is never updated or deleted.
>>
>> create column family request_summary
>>   with column_type = 'Standard'
>>   and comparator = 'UTF8Type'
>>   and default_validation_class = 'UTF8Type'
>>   and key_validation_class = 'UTF8Type'
>>   and read_repair_chance = 0.1
>>   and dclocal_read_repair_chance = 0.0
>>   and gc_grace = 0
>>   and min_compaction_threshold = 4
>>   and max_compaction_threshold = 32
>>   and replicate_on_write = true
>>   and compaction_strategy =
>> 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'
>>   and caching = 'NONE'
>>   and bloom_filter_fp_chance = 1.0
>>   and compression_options = {'chunk_length_kb' : '64',
>> 'sstable_compression' :
>> 'org.apache.cassandra.io.compress.SnappyCompressor'};
>>
>>
>> Thanks in advance for help in understanding why rows such as this are not
>> removed!
>>
>> -Bryan
>>
>>
>

Reply via email to