RE: LCS not removing rows with all TTL expired columns

Viktor Jevdokimov Thu, 17 Jan 2013 01:48:28 -0800

@Bryan,

To keep data size as low as possible with TTL columns we still use STCS and 
nightly major compactions.


Experience with LCS was not successful in our case, data size keeps too high 
along with amount of compactions.

IMO, before 1.2, LCS was good for CFs without TTL or high delete rate. I have 
not tested 1.2 LCS behavior, we're still on 1.0.x


Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.com<mailto:viktor.jevdoki...@adform.com>
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsider<http://twitter.com/#!/adforminsider>
Take a ride with Adform's Rich Media Suite<http://vimeo.com/adform/richmedia>

[Adform News] <http://www.adform.com>
[Adform awarded the Best Employer 2012] 
<http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/>


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: aaron morton [mailto:aa...@thelastpickle.com]
Sent: Thursday, January 17, 2013 06:24
To: user@cassandra.apache.org
Subject: Re: LCS not removing rows with all TTL expired columns

Minor compaction (with Size Tiered) will only purge tombstones if all fragments 
of a row are contained in the SSTables being compacted. So if you have a long 
lived row, that is present in many size tiers, the columns will not be purged.

 (thus compacted compacted) 3 days after all columns for that row had expired
Tombstones have to get on disk, even if you set the gc_grace_seconds to 0. If 
not they do not get a chance to delete previous versions of the column which 
already exist on disk. So when the compaction ran your ExpiringColumn was 
turned into a DeletedColumn and placed on disk.

I would expect the next round of compaction to remove these columns.

There is a new feature in 1.2 that may help you here. It will do a special 
compaction of individual sstables when they have a certain proportion of dead 
columns https://issues.apache.org/jira/browse/CASSANDRA-3442

Also interested to know if LCS helps.

Cheers


-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 17/01/2013, at 2:55 PM, Bryan Talbot 
<btal...@aeriagames.com<mailto:btal...@aeriagames.com>> wrote:


According to the timestamps (see original post) the SSTable was written (thus 
compacted compacted) 3 days after all columns for that row had expired and 6 
days after the row was created; yet all columns are still showing up in the 
SSTable.  Note that the column shows now rows when a "get" for that key is run 
so that's working correctly, but the data is lugged around far longer than it 
should be -- maybe forever.


-Bryan

On Wed, Jan 16, 2013 at 5:44 PM, Andrey Ilinykh 
<ailin...@gmail.com<mailto:ailin...@gmail.com>> wrote:
To get column removed you have to meet two requirements
1. column should be expired
2. after that CF gets compacted

I guess your expired columns are propagated to high tier CF, which gets 
compacted rarely.
So, you have to wait when high tier CF gets compacted.

Andrey


On Wed, Jan 16, 2013 at 11:39 AM, Bryan Talbot 
<btal...@aeriagames.com<mailto:btal...@aeriagames.com>> wrote:
On cassandra 1.1.5 with a write heavy workload, we're having problems getting 
rows to be compacted away (removed) even though all columns have expired TTL.  
We've tried size tiered and now leveled and are seeing the same symptom: the 
data stays around essentially forever.

Currently we write all columns with a TTL of 72 hours (259200 seconds) and 
expect to add 10 GB of data to this CF per day per node.  Each node currently 
has 73 GB for the affected CF and shows no indications that old rows will be 
removed on their own.

Why aren't rows being removed?  Below is some data from a sample row which 
should have been removed several days ago but is still around even though it 
has been involved in numerous compactions since being expired.

$> ./bin/nodetool -h localhost getsstables metrics request_summary 
459fb460-5ace-11e2-9b92-11d67b6163b4
/virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db

$> ls -alF 
/virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
-rw-rw-r-- 1 sandra sandra 5252320 Jan 16 08:42 
/virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db

$> ./bin/sstable2json 
/virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
 -k $(echo -n 459fb460-5ace-11e2-9b92-11d67b6163b4 | hexdump  -e '36/1 "%x"')
{
"34353966623436302d356163652d313165322d396239322d313164363762363136336234": 
[["app_name","50f21d3d",1357785277207001,"d"], 
["client_ip","50f21d3d",1357785277207001,"d"], 
["client_req_id","50f21d3d",1357785277207001,"d"], 
["mysql_call_cnt","50f21d3d",1357785277207001,"d"], 
["mysql_duration_us","50f21d3d",1357785277207001,"d"], 
["mysql_failure_call_cnt","50f21d3d",1357785277207001,"d"], 
["mysql_success_call_cnt","50f21d3d",1357785277207001,"d"], 
["req_duration_us","50f21d3d",1357785277207001,"d"], 
["req_finish_time_us","50f21d3d",1357785277207001,"d"], 
["req_method","50f21d3d",1357785277207001,"d"], 
["req_service","50f21d3d",1357785277207001,"d"], 
["req_start_time_us","50f21d3d",1357785277207001,"d"], 
["success","50f21d3d",1357785277207001,"d"]]
}


Decoding the column timestamps to shows that the columns were written at "Thu, 
10 Jan 2013 02:34:37 GMT" and that their TTL expired at "Sun, 13 Jan 2013 
02:34:37 GMT".  The date of the SSTable shows that it was generated on Jan 16 
which is 3 days after all columns have TTL-ed out.


The schema shows that gc_grace is set to 0 since this data is write-once, 
read-seldom and is never updated or deleted.

create column family request_summary
  with column_type = 'Standard'
  and comparator = 'UTF8Type'
  and default_validation_class = 'UTF8Type'
  and key_validation_class = 'UTF8Type'
  and read_repair_chance = 0.1
  and dclocal_read_repair_chance = 0.0
  and gc_grace = 0
  and min_compaction_threshold = 4
  and max_compaction_threshold = 32
  and replicate_on_write = true
  and compaction_strategy = 
'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'
  and caching = 'NONE'
  and bloom_filter_fp_chance = 1.0
  and compression_options = {'chunk_length_kb' : '64', 'sstable_compression' : 
'org.apache.cassandra.io.compress.SnappyCompressor'};


Thanks in advance for help in understanding why rows such as this are not 
removed!

-Bryan

<<inline: signature-logo29.png>>

<<inline: signature-best-employer-logo4823.png>>

RE: LCS not removing rows with all TTL expired columns

Reply via email to