Re: TWCS and autocompaction

2018-01-16 Thread Alexander Dejanovski
The ticket I was referring to is the following :
https://issues.apache.org/jira/browse/CASSANDRA-13418

It's been merged in 3.11.1, so just make sure you enable
unsafe_aggressive_sstable_expiration and you'll evict expired SSTables
regardless of overlaps (and IMHO it's totally safe to do this).
Do not ever run major compactions on TWCS tables unless you have a really,
really valid reason, and do not ever disable autocompaction on any table
for a long time.

Foreground read repair will still happen, regardless your settings, when
reading at QUORUM or LOCAL_QUORUM, that's just part of the read path.
read_repair_chance and dc_read_repair_chance set to 0.0 will only disable
background read repair, which also happens at other consistency levels.

Currently, you have a default TTL of 1555200 and a 4 hours time window,
which can create up to 108 live buckets.
The advice Jeff Jirsa gave back in the days is to try to keep the number of
live buckets between 50 and 60, which means you should double the size of
your time windows to 8 hours.

If you end up with 100 SSTables, then TWCS is properly doing its work,
keeping in mind that the current time window can/will have more than one
SSTable. Major compaction within a bucket will happen once it gets out of
the current time window.

Cheers,


On Tue, Jan 16, 2018 at 7:16 PM Cogumelos Maravilha <
cogumelosmaravi...@sapo.pt> wrote:

> Hi,
>
> My read_repair_chance is 0 (AND read_repair_chance = 0.0)
>
> When I bootstrap a new node there is around 700 sstables, but after auto
> compaction the number drop to around 100.
>
> I'm using C* 3.11.1. To solve the problem I've already changed to
> 'unchecked_tombstone_compaction': 'true'. Now should I run nodetool compact?
>
> And for the future crontab nodetool disableautocompaction?
>
> Thanks
>
> On 16-01-2018 11:35, Alexander Dejanovski wrote:
>
> Hi,
>
> The overlaps you're seeing on time windows aren't due to automatic
> compactions, but to read repairs.
> You must be reading at quorum or local_quorum which can perform foreground
> read repair in case of digest mismatch.
>
> You can set unchecked_tombstone_compaction to true if you want to perform
> single sstable compaction to purge tombstones and a patch has recently been
> merged in to allow twcs to delete fully expired data even in case of
> overlap between time windows (I can't remember if it's been merged in
> 3.11.1).
> Just so you know, the timestamp considered for time windows is the max
> timestamp. You can have old data in recent time windows, but not the
> opposite.
>
> Cheers,
>
> Le mar. 16 janv. 2018 à 12:07, Cogumelos Maravilha <
> cogumelosmaravi...@sapo.pt> a écrit :
>
>> Hi list,
>>
>> My settings:
>>
>> AND compaction = {'class':
>> 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy',
>> 'compaction_window_size': '4', 'compaction_window_unit': 'HOURS',
>> 'enabled': 'true', 'max_threshold': '64', 'min_threshold': '2',
>> 'tombstone_compaction_interval': '15000', 'tombstone_threshold': '0.2',
>> 'unchecked_tombstone_compaction': 'false'}
>> AND compression = {'chunk_length_in_kb': '64', 'class':
>> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>> AND crc_check_chance = 0.0
>> AND dclocal_read_repair_chance = 0.0
>> AND default_time_to_live = 1555200
>> AND gc_grace_seconds = 10800
>> AND max_index_interval = 2048
>> AND memtable_flush_period_in_ms = 0
>> AND min_index_interval = 128
>> AND read_repair_chance = 0.0
>> AND speculative_retry = '99PERCENTILE';
>>
>> Running this script:
>>
>> for f in *Data.db; do
>>ls -lrt $f
>>output=$(sstablemetadata $f 2>/dev/null)
>>max=$(echo "$output" | grep Maximum\ timestamp | cut -d" " -f3 | cut
>> -c 1-10)
>>min=$(echo "$output" | grep Minimum\ timestamp | cut -d" " -f3 | cut
>> -c 1-10)
>>date -d @$max +'%d/%m/%Y %H:%M:%S'
>>date -d @$min +'%d/%m/%Y %H:%M:%S'
>> done
>>
>> on sstables I'm getting values like these:
>>
>> -rw-r--r-- 1 cassandra cassandra 12137573577 <(213)%20757-3577> Jan 14
>> 20:08
>> mc-22750-big-Data.db
>> 14/01/2018 19:57:41
>> 31/12/2017 19:06:48
>>
>> -rw-r--r-- 1 cassandra cassandra 4669422106 Jan 14 06:55
>> mc-22322-big-Data.db
>> 12/01/2018 07:59:57
>> 28/12/2017 19:08:42
>>
>> My goal is using TWCS for sstables expired fast because lots of new data
>> is coming in. What is the best approach to archive that? Should I
>> disable auto compaction?
>> Thanks in advance.
>>
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>
>> --
> -
> Alexander Dejanovski
> France
> @alexanderdeja
>
> Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>
>
>

-- 
-
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com


Re: TWCS and autocompaction

2018-01-16 Thread Cogumelos Maravilha
Hi,

My read_repair_chance is 0 (AND read_repair_chance = 0.0)

When I bootstrap a new node there is around 700 sstables, but after auto
compaction the number drop to around 100.

I'm using C* 3.11.1. To solve the problem I've already changed to
'unchecked_tombstone_compaction': 'true'. Now should I run nodetool compact?

And for the future crontab nodetool disableautocompaction?

Thanks


On 16-01-2018 11:35, Alexander Dejanovski wrote:
>
> Hi,
>
> The overlaps you're seeing on time windows aren't due to automatic
> compactions, but to read repairs.
> You must be reading at quorum or local_quorum which can perform
> foreground read repair in case of digest mismatch.
>
> You can set unchecked_tombstone_compaction to true if you want to
> perform single sstable compaction to purge tombstones and a patch has
> recently been merged in to allow twcs to delete fully expired data
> even in case of overlap between time windows (I can't remember if it's
> been merged in 3.11.1).
> Just so you know, the timestamp considered for time windows is the max
> timestamp. You can have old data in recent time windows, but not the
> opposite.
>
> Cheers,
>
>
> Le mar. 16 janv. 2018 à 12:07, Cogumelos Maravilha
> > a écrit :
>
> Hi list,
>
> My settings:
>
> AND compaction = {'class':
> 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy',
> 'compaction_window_size': '4', 'compaction_window_unit': 'HOURS',
> 'enabled': 'true', 'max_threshold': '64', 'min_threshold': '2',
> 'tombstone_compaction_interval': '15000', 'tombstone_threshold':
> '0.2',
> 'unchecked_tombstone_compaction': 'false'}
>     AND compression = {'chunk_length_in_kb': '64', 'class':
> 'org.apache.cassandra.io
> .compress.LZ4Compressor'}
>     AND crc_check_chance = 0.0
>     AND dclocal_read_repair_chance = 0.0
>     AND default_time_to_live = 1555200
>     AND gc_grace_seconds = 10800
>     AND max_index_interval = 2048
>     AND memtable_flush_period_in_ms = 0
>     AND min_index_interval = 128
>     AND read_repair_chance = 0.0
>     AND speculative_retry = '99PERCENTILE';
>
> Running this script:
>
> for f in *Data.db; do
>    ls -lrt $f
>    output=$(sstablemetadata $f 2>/dev/null)
>    max=$(echo "$output" | grep Maximum\ timestamp | cut -d" " -f3
> | cut
> -c 1-10)
>    min=$(echo "$output" | grep Minimum\ timestamp | cut -d" " -f3
> | cut
> -c 1-10)
>    date -d @$max +'%d/%m/%Y %H:%M:%S'
>    date -d @$min +'%d/%m/%Y %H:%M:%S'
> done
>
> on sstables I'm getting values like these:
>
> -rw-r--r-- 1 cassandra cassandra 12137573577 Jan 14 20:08
> mc-22750-big-Data.db
> 14/01/2018 19:57:41
> 31/12/2017 19:06:48
>
> -rw-r--r-- 1 cassandra cassandra 4669422106 Jan 14 06:55
> mc-22322-big-Data.db
> 12/01/2018 07:59:57
> 28/12/2017 19:08:42
>
> My goal is using TWCS for sstables expired fast because lots of
> new data
> is coming in. What is the best approach to archive that? Should I
> disable auto compaction?
> Thanks in advance.
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> 
> For additional commands, e-mail: user-h...@cassandra.apache.org
> 
>
> -- 
> -
> Alexander Dejanovski
> France
> @alexanderdeja
>
> Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com 



Re: TWCS and autocompaction

2018-01-16 Thread Alexander Dejanovski
Hi,

The overlaps you're seeing on time windows aren't due to automatic
compactions, but to read repairs.
You must be reading at quorum or local_quorum which can perform foreground
read repair in case of digest mismatch.

You can set unchecked_tombstone_compaction to true if you want to perform
single sstable compaction to purge tombstones and a patch has recently been
merged in to allow twcs to delete fully expired data even in case of
overlap between time windows (I can't remember if it's been merged in
3.11.1).
Just so you know, the timestamp considered for time windows is the max
timestamp. You can have old data in recent time windows, but not the
opposite.

Cheers,

Le mar. 16 janv. 2018 à 12:07, Cogumelos Maravilha <
cogumelosmaravi...@sapo.pt> a écrit :

> Hi list,
>
> My settings:
>
> AND compaction = {'class':
> 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy',
> 'compaction_window_size': '4', 'compaction_window_unit': 'HOURS',
> 'enabled': 'true', 'max_threshold': '64', 'min_threshold': '2',
> 'tombstone_compaction_interval': '15000', 'tombstone_threshold': '0.2',
> 'unchecked_tombstone_compaction': 'false'}
> AND compression = {'chunk_length_in_kb': '64', 'class':
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND crc_check_chance = 0.0
> AND dclocal_read_repair_chance = 0.0
> AND default_time_to_live = 1555200
> AND gc_grace_seconds = 10800
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.0
> AND speculative_retry = '99PERCENTILE';
>
> Running this script:
>
> for f in *Data.db; do
>ls -lrt $f
>output=$(sstablemetadata $f 2>/dev/null)
>max=$(echo "$output" | grep Maximum\ timestamp | cut -d" " -f3 | cut
> -c 1-10)
>min=$(echo "$output" | grep Minimum\ timestamp | cut -d" " -f3 | cut
> -c 1-10)
>date -d @$max +'%d/%m/%Y %H:%M:%S'
>date -d @$min +'%d/%m/%Y %H:%M:%S'
> done
>
> on sstables I'm getting values like these:
>
> -rw-r--r-- 1 cassandra cassandra 12137573577 Jan 14 20:08
> mc-22750-big-Data.db
> 14/01/2018 19:57:41
> 31/12/2017 19:06:48
>
> -rw-r--r-- 1 cassandra cassandra 4669422106 Jan 14 06:55
> mc-22322-big-Data.db
> 12/01/2018 07:59:57
> 28/12/2017 19:08:42
>
> My goal is using TWCS for sstables expired fast because lots of new data
> is coming in. What is the best approach to archive that? Should I
> disable auto compaction?
> Thanks in advance.
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
> --
-
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com


TWCS and autocompaction

2018-01-16 Thread Cogumelos Maravilha
Hi list,

My settings:

AND compaction = {'class':
'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy',
'compaction_window_size': '4', 'compaction_window_unit': 'HOURS',
'enabled': 'true', 'max_threshold': '64', 'min_threshold': '2',
'tombstone_compaction_interval': '15000', 'tombstone_threshold': '0.2',
'unchecked_tombstone_compaction': 'false'}
    AND compression = {'chunk_length_in_kb': '64', 'class':
'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 0.0
    AND dclocal_read_repair_chance = 0.0
    AND default_time_to_live = 1555200
    AND gc_grace_seconds = 10800
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99PERCENTILE';

Running this script:

for f in *Data.db; do
   ls -lrt $f
   output=$(sstablemetadata $f 2>/dev/null)
   max=$(echo "$output" | grep Maximum\ timestamp | cut -d" " -f3 | cut
-c 1-10)
   min=$(echo "$output" | grep Minimum\ timestamp | cut -d" " -f3 | cut
-c 1-10)
   date -d @$max +'%d/%m/%Y %H:%M:%S'
   date -d @$min +'%d/%m/%Y %H:%M:%S'
done

on sstables I'm getting values like these:

-rw-r--r-- 1 cassandra cassandra 12137573577 Jan 14 20:08
mc-22750-big-Data.db
14/01/2018 19:57:41
31/12/2017 19:06:48

-rw-r--r-- 1 cassandra cassandra 4669422106 Jan 14 06:55
mc-22322-big-Data.db
12/01/2018 07:59:57
28/12/2017 19:08:42

My goal is using TWCS for sstables expired fast because lots of new data
is coming in. What is the best approach to archive that? Should I
disable auto compaction?
Thanks in advance.


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org