Re: Max number of windows when using TWCS

2019-02-11 Thread Osman YOZGATLIOĞLU
Hello,

By the way, about https://issues.apache.org/jira/browse/CASSANDRA-13418, I'm 
not sure how to apply this solution.

Do you have a guide about it?


Regards,

Osman


On 12.02.2019 01:42, Nitan Kainth wrote:
That’s right Jeff. That’s why I am thinking why not compaction gets rid of old 
exited sstables?


Regards,
Nitan
Cell: 510 449 9629

On Feb 11, 2019, at 3:53 PM, Jeff Jirsa 
mailto:jji...@gmail.com>> wrote:

It's probably not safe. You shouldn't touch the underlying sstables unless 
you're very sure you know what you're doing.


On Mon, Feb 11, 2019 at 1:05 PM Akash Gangil 
mailto:akashg1...@gmail.com>> wrote:
I have in the past tried to delete SSTables manually, but have noticed bits and 
pieces of that data still remain, even though the sstables of that window is 
deleted. So always wondered if playing directly with the underlying filesystem 
is a safe bet?


On Mon, Feb 11, 2019 at 1:01 PM Jonathan Haddad 
mailto:j...@jonhaddad.com>> wrote:
Deleting SSTables manually can be useful if you don't know your TTL up front.  
For example, you have an ETL process that moves your raw Cassandra data into S3 
as parquet files, and you want to be sure that process is completed before you 
delete the data.  You could also start out without setting a TTL and later 
realize you need one.  This is a remarkably common problem.

On Mon, Feb 11, 2019 at 12:51 PM Nitan Kainth 
mailto:nitankai...@gmail.com>> wrote:
Jeff,

It means we have to delete sstables manually?


Regards,
Nitan
Cell: 510 449 9629

On Feb 11, 2019, at 2:40 PM, Jeff Jirsa 
mailto:jji...@gmail.com>> wrote:

There's a bit of headache around overlapping sstables being strictly safe to 
delete.  https://issues.apache.org/jira/browse/CASSANDRA-13418 was added to 
allow the "I know it's not technically safe, but just delete it anyway" use 
case. For a lot of people who started using TWCS before 13418, "stop cassandra, 
remove stuff we know is expired, start cassandra" is a not-uncommon pattern in 
very high-write, high-disk-space use cases.



On Mon, Feb 11, 2019 at 12:34 PM Nitan Kainth 
mailto:nitankai...@gmail.com>> wrote:
Hi,
In regards to comment “Purging data is also straightforward, just dropping 
SSTables (by a script) where create date is older than a threshold, we don't 
even need to rely on TTL”

Doesn’t the old sstables drop by itself? One ttl and gc grace seconds past 
whole sstable will have only tombstones.


Regards,
Nitan
Cell: 510 449 9629

On Feb 11, 2019, at 2:23 PM, DuyHai Doan 
mailto:doanduy...@gmail.com>> wrote:

Purging data is also straightforward, just dropping SSTables (by a script) 
where create date is older than a threshold, we don't even need to rely on TTL


--
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade


--
Akash


Re: Max number of windows when using TWCS

2019-02-11 Thread Nitan Kainth
That’s right Jeff. That’s why I am thinking why not compaction gets rid of old 
exited sstables?


Regards,
Nitan
Cell: 510 449 9629

> On Feb 11, 2019, at 3:53 PM, Jeff Jirsa  wrote:
> 
> It's probably not safe. You shouldn't touch the underlying sstables unless 
> you're very sure you know what you're doing.
> 
> 
>> On Mon, Feb 11, 2019 at 1:05 PM Akash Gangil  wrote:
>> I have in the past tried to delete SSTables manually, but have noticed bits 
>> and pieces of that data still remain, even though the sstables of that 
>> window is deleted. So always wondered if playing directly with the 
>> underlying filesystem is a safe bet?
>> 
>> 
>>> On Mon, Feb 11, 2019 at 1:01 PM Jonathan Haddad  wrote:
>>> Deleting SSTables manually can be useful if you don't know your TTL up 
>>> front.  For example, you have an ETL process that moves your raw Cassandra 
>>> data into S3 as parquet files, and you want to be sure that process is 
>>> completed before you delete the data.  You could also start out without 
>>> setting a TTL and later realize you need one.  This is a remarkably common 
>>> problem.
>>> 
 On Mon, Feb 11, 2019 at 12:51 PM Nitan Kainth  
 wrote:
 Jeff,
 
 It means we have to delete sstables manually?
 
 
 Regards,
 Nitan
 Cell: 510 449 9629
 
> On Feb 11, 2019, at 2:40 PM, Jeff Jirsa  wrote:
> 
> There's a bit of headache around overlapping sstables being strictly safe 
> to delete.  https://issues.apache.org/jira/browse/CASSANDRA-13418 was 
> added to allow the "I know it's not technically safe, but just delete it 
> anyway" use case. For a lot of people who started using TWCS before 
> 13418, "stop cassandra, remove stuff we know is expired, start cassandra" 
> is a not-uncommon pattern in very high-write, high-disk-space use cases. 
> 
> 
> 
>> On Mon, Feb 11, 2019 at 12:34 PM Nitan Kainth  
>> wrote:
>> Hi,
>> In regards to comment “Purging data is also straightforward, just 
>> dropping SSTables (by a script) where create date is older than a 
>> threshold, we don't even need to rely on TTL”
>> 
>> Doesn’t the old sstables drop by itself? One ttl and gc grace seconds 
>> past whole sstable will have only tombstones.
>> 
>> 
>> Regards,
>> Nitan
>> Cell: 510 449 9629
>> 
>>> On Feb 11, 2019, at 2:23 PM, DuyHai Doan  wrote:
>>> 
>>> Purging data is also straightforward, just dropping SSTables (by a 
>>> script) where create date is older than a threshold, we don't even need 
>>> to rely on TTL
>>> 
>>> 
>>> -- 
>>> Jon Haddad
>>> http://www.rustyrazorblade.com
>>> twitter: rustyrazorblade
>> 
>> 
>> -- 
>> Akash


Re: Max number of windows when using TWCS

2019-02-11 Thread Jeff Jirsa
It's probably not safe. You shouldn't touch the underlying sstables unless
you're very sure you know what you're doing.


On Mon, Feb 11, 2019 at 1:05 PM Akash Gangil  wrote:

> I have in the past tried to delete SSTables manually, but have noticed
> bits and pieces of that data still remain, even though the sstables of that
> window is deleted. So always wondered if playing directly with the
> underlying filesystem is a safe bet?
>
>
> On Mon, Feb 11, 2019 at 1:01 PM Jonathan Haddad  wrote:
>
>> Deleting SSTables manually can be useful if you don't know your TTL up
>> front.  For example, you have an ETL process that moves your raw Cassandra
>> data into S3 as parquet files, and you want to be sure that process is
>> completed before you delete the data.  You could also start out without
>> setting a TTL and later realize you need one.  This is a remarkably common
>> problem.
>>
>> On Mon, Feb 11, 2019 at 12:51 PM Nitan Kainth 
>> wrote:
>>
>>> Jeff,
>>>
>>> It means we have to delete sstables manually?
>>>
>>>
>>> Regards,
>>>
>>> Nitan
>>>
>>> Cell: 510 449 9629
>>>
>>> On Feb 11, 2019, at 2:40 PM, Jeff Jirsa  wrote:
>>>
>>> There's a bit of headache around overlapping sstables being strictly
>>> safe to delete.  https://issues.apache.org/jira/browse/CASSANDRA-13418
>>> was added to allow the "I know it's not technically safe, but just delete
>>> it anyway" use case. For a lot of people who started using TWCS before
>>> 13418, "stop cassandra, remove stuff we know is expired, start cassandra"
>>> is a not-uncommon pattern in very high-write, high-disk-space use cases.
>>>
>>>
>>>
>>> On Mon, Feb 11, 2019 at 12:34 PM Nitan Kainth 
>>> wrote:
>>>
 Hi,
 In regards to comment “Purging data is also straightforward, just
 dropping SSTables (by a script) where create date is older than a
 threshold, we don't even need to rely on TTL”

 Doesn’t the old sstables drop by itself? One ttl and gc grace seconds
 past whole sstable will have only tombstones.


 Regards,

 Nitan

 Cell: 510 449 9629

 On Feb 11, 2019, at 2:23 PM, DuyHai Doan  wrote:

 Purging data is also straightforward, just dropping SSTables (by a
 script) where create date is older than a threshold, we don't even need to
 rely on TTL


>>
>> --
>> Jon Haddad
>> http://www.rustyrazorblade.com
>> twitter: rustyrazorblade
>>
>
>
> --
> Akash
>


Re: Max number of windows when using TWCS

2019-02-11 Thread Akash Gangil
I have in the past tried to delete SSTables manually, but have noticed bits
and pieces of that data still remain, even though the sstables of that
window is deleted. So always wondered if playing directly with the
underlying filesystem is a safe bet?


On Mon, Feb 11, 2019 at 1:01 PM Jonathan Haddad  wrote:

> Deleting SSTables manually can be useful if you don't know your TTL up
> front.  For example, you have an ETL process that moves your raw Cassandra
> data into S3 as parquet files, and you want to be sure that process is
> completed before you delete the data.  You could also start out without
> setting a TTL and later realize you need one.  This is a remarkably common
> problem.
>
> On Mon, Feb 11, 2019 at 12:51 PM Nitan Kainth 
> wrote:
>
>> Jeff,
>>
>> It means we have to delete sstables manually?
>>
>>
>> Regards,
>>
>> Nitan
>>
>> Cell: 510 449 9629
>>
>> On Feb 11, 2019, at 2:40 PM, Jeff Jirsa  wrote:
>>
>> There's a bit of headache around overlapping sstables being strictly safe
>> to delete.  https://issues.apache.org/jira/browse/CASSANDRA-13418 was
>> added to allow the "I know it's not technically safe, but just delete it
>> anyway" use case. For a lot of people who started using TWCS before 13418,
>> "stop cassandra, remove stuff we know is expired, start cassandra" is a
>> not-uncommon pattern in very high-write, high-disk-space use cases.
>>
>>
>>
>> On Mon, Feb 11, 2019 at 12:34 PM Nitan Kainth 
>> wrote:
>>
>>> Hi,
>>> In regards to comment “Purging data is also straightforward, just
>>> dropping SSTables (by a script) where create date is older than a
>>> threshold, we don't even need to rely on TTL”
>>>
>>> Doesn’t the old sstables drop by itself? One ttl and gc grace seconds
>>> past whole sstable will have only tombstones.
>>>
>>>
>>> Regards,
>>>
>>> Nitan
>>>
>>> Cell: 510 449 9629
>>>
>>> On Feb 11, 2019, at 2:23 PM, DuyHai Doan  wrote:
>>>
>>> Purging data is also straightforward, just dropping SSTables (by a
>>> script) where create date is older than a threshold, we don't even need to
>>> rely on TTL
>>>
>>>
>
> --
> Jon Haddad
> http://www.rustyrazorblade.com
> twitter: rustyrazorblade
>


-- 
Akash


Re: Max number of windows when using TWCS

2019-02-11 Thread Jonathan Haddad
Deleting SSTables manually can be useful if you don't know your TTL up
front.  For example, you have an ETL process that moves your raw Cassandra
data into S3 as parquet files, and you want to be sure that process is
completed before you delete the data.  You could also start out without
setting a TTL and later realize you need one.  This is a remarkably common
problem.

On Mon, Feb 11, 2019 at 12:51 PM Nitan Kainth  wrote:

> Jeff,
>
> It means we have to delete sstables manually?
>
>
> Regards,
>
> Nitan
>
> Cell: 510 449 9629
>
> On Feb 11, 2019, at 2:40 PM, Jeff Jirsa  wrote:
>
> There's a bit of headache around overlapping sstables being strictly safe
> to delete.  https://issues.apache.org/jira/browse/CASSANDRA-13418 was
> added to allow the "I know it's not technically safe, but just delete it
> anyway" use case. For a lot of people who started using TWCS before 13418,
> "stop cassandra, remove stuff we know is expired, start cassandra" is a
> not-uncommon pattern in very high-write, high-disk-space use cases.
>
>
>
> On Mon, Feb 11, 2019 at 12:34 PM Nitan Kainth 
> wrote:
>
>> Hi,
>> In regards to comment “Purging data is also straightforward, just
>> dropping SSTables (by a script) where create date is older than a
>> threshold, we don't even need to rely on TTL”
>>
>> Doesn’t the old sstables drop by itself? One ttl and gc grace seconds
>> past whole sstable will have only tombstones.
>>
>>
>> Regards,
>>
>> Nitan
>>
>> Cell: 510 449 9629
>>
>> On Feb 11, 2019, at 2:23 PM, DuyHai Doan  wrote:
>>
>> Purging data is also straightforward, just dropping SSTables (by a
>> script) where create date is older than a threshold, we don't even need to
>> rely on TTL
>>
>>

-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade


Re: Max number of windows when using TWCS

2019-02-11 Thread Nitan Kainth
Jeff,

It means we have to delete sstables manually?


Regards,
Nitan
Cell: 510 449 9629

> On Feb 11, 2019, at 2:40 PM, Jeff Jirsa  wrote:
> 
> There's a bit of headache around overlapping sstables being strictly safe to 
> delete.  https://issues.apache.org/jira/browse/CASSANDRA-13418 was added to 
> allow the "I know it's not technically safe, but just delete it anyway" use 
> case. For a lot of people who started using TWCS before 13418, "stop 
> cassandra, remove stuff we know is expired, start cassandra" is a 
> not-uncommon pattern in very high-write, high-disk-space use cases. 
> 
> 
> 
>> On Mon, Feb 11, 2019 at 12:34 PM Nitan Kainth  wrote:
>> Hi,
>> In regards to comment “Purging data is also straightforward, just dropping 
>> SSTables (by a script) where create date is older than a threshold, we don't 
>> even need to rely on TTL”
>> 
>> Doesn’t the old sstables drop by itself? One ttl and gc grace seconds past 
>> whole sstable will have only tombstones.
>> 
>> 
>> Regards,
>> Nitan
>> Cell: 510 449 9629
>> 
>>> On Feb 11, 2019, at 2:23 PM, DuyHai Doan  wrote:
>>> 
>>> Purging data is also straightforward, just dropping SSTables (by a script) 
>>> where create date is older than a threshold, we don't even need to rely on 
>>> TTL


Re: Max number of windows when using TWCS

2019-02-11 Thread DuyHai Doan
thanks for the pointer Jeff

On Mon, Feb 11, 2019 at 9:40 PM Jeff Jirsa  wrote:

> There's a bit of headache around overlapping sstables being strictly safe
> to delete.  https://issues.apache.org/jira/browse/CASSANDRA-13418 was
> added to allow the "I know it's not technically safe, but just delete it
> anyway" use case. For a lot of people who started using TWCS before 13418,
> "stop cassandra, remove stuff we know is expired, start cassandra" is a
> not-uncommon pattern in very high-write, high-disk-space use cases.
>
>
>
> On Mon, Feb 11, 2019 at 12:34 PM Nitan Kainth 
> wrote:
>
>> Hi,
>> In regards to comment “Purging data is also straightforward, just
>> dropping SSTables (by a script) where create date is older than a
>> threshold, we don't even need to rely on TTL”
>>
>> Doesn’t the old sstables drop by itself? One ttl and gc grace seconds
>> past whole sstable will have only tombstones.
>>
>>
>> Regards,
>>
>> Nitan
>>
>> Cell: 510 449 9629
>>
>> On Feb 11, 2019, at 2:23 PM, DuyHai Doan  wrote:
>>
>> Purging data is also straightforward, just dropping SSTables (by a
>> script) where create date is older than a threshold, we don't even need to
>> rely on TTL
>>
>>


Re: Max number of windows when using TWCS

2019-02-11 Thread Jeff Jirsa
There's a bit of headache around overlapping sstables being strictly safe
to delete.  https://issues.apache.org/jira/browse/CASSANDRA-13418 was added
to allow the "I know it's not technically safe, but just delete it anyway"
use case. For a lot of people who started using TWCS before 13418, "stop
cassandra, remove stuff we know is expired, start cassandra" is a
not-uncommon pattern in very high-write, high-disk-space use cases.



On Mon, Feb 11, 2019 at 12:34 PM Nitan Kainth  wrote:

> Hi,
> In regards to comment “Purging data is also straightforward, just
> dropping SSTables (by a script) where create date is older than a
> threshold, we don't even need to rely on TTL”
>
> Doesn’t the old sstables drop by itself? One ttl and gc grace seconds past
> whole sstable will have only tombstones.
>
>
> Regards,
>
> Nitan
>
> Cell: 510 449 9629
>
> On Feb 11, 2019, at 2:23 PM, DuyHai Doan  wrote:
>
> Purging data is also straightforward, just dropping SSTables (by a script)
> where create date is older than a threshold, we don't even need to rely on
> TTL
>
>


Re: Max number of windows when using TWCS

2019-02-11 Thread Nitan Kainth
Hi,
In regards to comment “Purging data is also straightforward, just dropping 
SSTables (by a script) where create date is older than a threshold, we don't 
even need to rely on TTL”

Doesn’t the old sstables drop by itself? One ttl and gc grace seconds past 
whole sstable will have only tombstones.


Regards,
Nitan
Cell: 510 449 9629

> On Feb 11, 2019, at 2:23 PM, DuyHai Doan  wrote:
> 
> Purging data is also straightforward, just dropping SSTables (by a script) 
> where create date is older than a threshold, we don't even need to rely on TTL


Re: Max number of windows when using TWCS

2019-02-11 Thread DuyHai Doan
No worry for overlapping, the use-case is about events/timeseries and there
is almost no delay so it should be fine.

On the note-side, since we have the guarantee to have 1 SSTable/day of
ingestion, this is very easy to "emulate" incremental backup. You just need
to find the generated SSTable with the latest create date and back it up
every day at midnight with a script.

Purging data is also straightforward, just dropping SSTables (by a script)
where create date is older than a threshold, we don't even need to rely on
TTL



On Mon, Feb 11, 2019 at 9:19 PM Jeff Jirsa  wrote:

> Wild ass guess based on a large use case I knew about at the time
>
> If you go above that, I expect it’d largely be fine as long as you were
> sure they weren’t overlapping so reads only ever touched a small subset of
> the windows (ideally 1).
>
> If you have one day windows and every read touches all of the windows,
> you’re going to have a bad time.
>
> --
> Jeff Jirsa
>
>
> On Feb 11, 2019, at 12:12 PM, DuyHai Doan  wrote:
>
> Hello users
>
> On the official documentation for TWCS (
> http://cassandra.apache.org/doc/latest/operating/compaction.html#time-window-compactionstrategy)
> it is advised to select the windows unit and size so that the total number
> of windows intervals is around 20-30.
>
> Is there any explanation for this range of 20-30 ? What if we exceed this
> range, let's say having 1 day windows and keeping data for 1year, thus
> having indeed 356 intervals ? What can go wrong with this ?
>
> Regards
>
> Duy Hai DOAN
>
>


Re: Max number of windows when using TWCS

2019-02-11 Thread Jeff Jirsa
Wild ass guess based on a large use case I knew about at the time

If you go above that, I expect it’d largely be fine as long as you were sure 
they weren’t overlapping so reads only ever touched a small subset of the 
windows (ideally 1).

If you have one day windows and every read touches all of the windows, you’re 
going to have a bad time. 

-- 
Jeff Jirsa


> On Feb 11, 2019, at 12:12 PM, DuyHai Doan  wrote:
> 
> Hello users
> 
> On the official documentation for TWCS 
> (http://cassandra.apache.org/doc/latest/operating/compaction.html#time-window-compactionstrategy)
>  it is advised to select the windows unit and size so that the total number 
> of windows intervals is around 20-30.
> 
> Is there any explanation for this range of 20-30 ? What if we exceed this 
> range, let's say having 1 day windows and keeping data for 1year, thus having 
> indeed 356 intervals ? What can go wrong with this ?
> 
> Regards
> 
> Duy Hai DOAN


Max number of windows when using TWCS

2019-02-11 Thread DuyHai Doan
Hello users

On the official documentation for TWCS (
http://cassandra.apache.org/doc/latest/operating/compaction.html#time-window-compactionstrategy)
it is advised to select the windows unit and size so that the total number
of windows intervals is around 20-30.

Is there any explanation for this range of 20-30 ? What if we exceed this
range, let's say having 1 day windows and keeping data for 1year, thus
having indeed 356 intervals ? What can go wrong with this ?

Regards

Duy Hai DOAN