Re: How do TTLs generate tombstones

Sumanth Pasupuleti Wed, 11 Oct 2017 09:25:15 -0700

Hi Eugene,

Common contributors to overlapping SSTables are
1. Hints
2. Repairs
3. New writes with old timestamps (should be rare but technically possible)


I would not run repairs with TWCS - as you indicated, it is going to result
in overlapping SSTables which impacts disk space and read latency since
reads now have to encompass multiple SSTables.

As for https://issues.apache.org/jira/browse/CASSANDRA-13418, I would not
worry about data resurrection as long as all the writes carry TTL with them.

We faced similar overlapping issues with TWCS (it wss due to
dclocal_read_repair_chance) - we developed an SSTable tool that would give
topN or bottomN keys in an SSTable based on writetime/deletion time - we
used this to identify the specific keys responsible for overlap between
SSTables.

Thanks,
Sumanth


On Mon, Oct 9, 2017 at 6:36 PM, eugene miretsky <eugene.miret...@gmail.com>
wrote:

> Thanks Alain!
>
> We are using TWCS compaction, and I read your blog multiple times - it was
> very useful, thanks!
>
> We are seeing a lot of overlapping SSTables, leading to a lot of problems:
> (a) large number of tombstones read in queries, (b) high CPU usage, (c)
> fairly long Young Gen GC collection (300ms)
>
> We have read_repair_change = 0, and unchecked_tombstone_compaction =
> true, gc_grace_seconds = 3h,  but we read and write with consistency = 1.
>
> I'm suspecting the overlap is coming from either hinted handoff or a
> repair job we run nightly.
>
> 1) Is running repair with TWCS recommended? It seems like it will always
> create a neverending overlap (the repair SSTable will have data from all 24
> hours), an effect that seems to get amplified with anti-compaction.
> 2) TWCS seems to introduce a tradeoff between eventual consistency and
> write/read availability. If all repairs are turned off, then the choice is
> either (a) user strong consistency level, and pay the price of lower
> availability and slowers reads or writes, or (b) use lower consistency
> level, and risk inconsistent data (data is never repaired)
>
> I will try your last link but reappearing data sound a bit scary :)
>
> Any advice on how to debug this further would be greatly apprecaited.
>
> Cheers,
> Eugene
>
> On Fri, Oct 6, 2017 at 11:02 AM, Alain RODRIGUEZ <arodr...@gmail.com>
> wrote:
>
>> Hi Eugene,
>>
>> If we never use updates (time series data), is it safe to set
>>> gc_grace_seconds=0.
>>
>>
>> As Kurt pointed, you never want 'gc_grace_seconds' to be lower than
>> 'max_hint_window_in_ms' as the min off these 2 values is used for hints
>> storage window size in Apache Cassandra.
>>
>> Yet time series data with fixed TTLs allows a very efficient use of
>> Cassandra, specially when using Time Window Compaction Strategy (TWCS).
>> Funny fact is that Jeff brought it to Apache Cassandra :-). I would
>> definitely give it a try.
>>
>> Here is a post from my colleague Alex that I believe could be useful in
>> your case: http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html
>>
>> Using TWCS and setting and lowering 'gc_grace_seconds' to the value of
>> 'max_hint_window_in_ms' should be really effective. Make sure to use a
>> strong consistency level (generally RF = 3, CL.Read = CL.Write =
>> LOCAL_QUORUM) to prevent inconsistencies I would say (and depending on your
>> interest in consistency).
>>
>> This way you could expire entires SSTables, without compaction. If
>> overlaps in SSTables become a problem, you could even consider to give a
>> try to a more aggressive SSTable expiration
>> https://issues.apache.org/jira/browse/CASSANDRA-13418.
>>
>> C*heers,
>> -----------------------
>> Alain Rodriguez - @arodream - al...@thelastpickle.com
>> France / Spain
>>
>> The Last Pickle - Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>>
>>
>> 2017-10-05 23:44 GMT+01:00 kurt greaves <k...@instaclustr.com>:
>>
>>> No it's never safe to set it to 0 as you'll disable hinted handoff for
>>> the table. If you are never doing updates and manual deletes and you always
>>> insert with a ttl you can get away with setting it to the hinted handoff
>>> period.
>>>
>>> On 6 Oct. 2017 1:28 am, "eugene miretsky" <eugene.miret...@gmail.com>
>>> wrote:
>>>
>>>> Thanks Jeff,
>>>>
>>>> Make sense.
>>>> If we never use updates (time series data), is it safe to set
>>>> gc_grace_seconds=0.
>>>>
>>>>
>>>>
>>>> On Wed, Oct 4, 2017 at 5:59 PM, Jeff Jirsa <jji...@gmail.com> wrote:
>>>>
>>>>>
>>>>> The TTL'd cell is treated as a tombstone. gc_grace_seconds applies to
>>>>> TTL'd cells, because even though the data is TTL'd, it may have been
>>>>> written on top of another live cell that wasn't ttl'd:
>>>>>
>>>>> Imagine a test table, simple key->value (k, v).
>>>>>
>>>>> INSERT INTO table(k,v) values(1,1);
>>>>> Kill 1 of the 3 nodes
>>>>> UPDATE table USING TTL 60 SET v=1 WHERE k=1 ;
>>>>> 60 seconds later, the live nodes will see that data as deleted, but
>>>>> when that dead node comes back to life, it needs to learn of the deletion.
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Oct 4, 2017 at 2:05 PM, eugene miretsky <
>>>>> eugene.miret...@gmail.com> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> The following link says that TTLs generate tombstones -
>>>>>> https://docs.datastax.com/en/cql/3.3/cql/cql_using/useExpire.html.
>>>>>>
>>>>>> What exactly is the process that converts the TTL into a tombstone?
>>>>>>
>>>>>>    1. Is an actual new tombstone cell created when the TTL expires?
>>>>>>    2. Or, is the TTLed cell treated as a tombstone?
>>>>>>
>>>>>>
>>>>>> Also, does gc_grace_period have an effect on TTLed cells?
>>>>>> gc_grace_period is meant to protect from deleted data re-appearing if the
>>>>>> tombstone is compacted away before all nodes have reached a consistent
>>>>>> state. However, since the ttl is stored in the cell (in liveness_info),
>>>>>> there is no way for the cell to re-appear (the ttl will still be there)
>>>>>>
>>>>>> Cheers,
>>>>>> Eugene
>>>>>>
>>>>>>
>>>>>
>>>>
>>
>

Re: How do TTLs generate tombstones

Reply via email to