Re: Max number of windows when using TWCS
Hello, By the way, about https://issues.apache.org/jira/browse/CASSANDRA-13418, I'm not sure how to apply this solution. Do you have a guide about it? Regards, Osman On 12.02.2019 01:42, Nitan Kainth wrote: That’s right Jeff. That’s why I am thinking why not compaction gets rid of old exited sstables? Regards, Nitan Cell: 510 449 9629 On Feb 11, 2019, at 3:53 PM, Jeff Jirsa mailto:jji...@gmail.com>> wrote: It's probably not safe. You shouldn't touch the underlying sstables unless you're very sure you know what you're doing. On Mon, Feb 11, 2019 at 1:05 PM Akash Gangil mailto:akashg1...@gmail.com>> wrote: I have in the past tried to delete SSTables manually, but have noticed bits and pieces of that data still remain, even though the sstables of that window is deleted. So always wondered if playing directly with the underlying filesystem is a safe bet? On Mon, Feb 11, 2019 at 1:01 PM Jonathan Haddad mailto:j...@jonhaddad.com>> wrote: Deleting SSTables manually can be useful if you don't know your TTL up front. For example, you have an ETL process that moves your raw Cassandra data into S3 as parquet files, and you want to be sure that process is completed before you delete the data. You could also start out without setting a TTL and later realize you need one. This is a remarkably common problem. On Mon, Feb 11, 2019 at 12:51 PM Nitan Kainth mailto:nitankai...@gmail.com>> wrote: Jeff, It means we have to delete sstables manually? Regards, Nitan Cell: 510 449 9629 On Feb 11, 2019, at 2:40 PM, Jeff Jirsa mailto:jji...@gmail.com>> wrote: There's a bit of headache around overlapping sstables being strictly safe to delete. https://issues.apache.org/jira/browse/CASSANDRA-13418 was added to allow the "I know it's not technically safe, but just delete it anyway" use case. For a lot of people who started using TWCS before 13418, "stop cassandra, remove stuff we know is expired, start cassandra" is a not-uncommon pattern in very high-write, high-disk-space use cases. On Mon, Feb 11, 2019 at 12:34 PM Nitan Kainth mailto:nitankai...@gmail.com>> wrote: Hi, In regards to comment “Purging data is also straightforward, just dropping SSTables (by a script) where create date is older than a threshold, we don't even need to rely on TTL” Doesn’t the old sstables drop by itself? One ttl and gc grace seconds past whole sstable will have only tombstones. Regards, Nitan Cell: 510 449 9629 On Feb 11, 2019, at 2:23 PM, DuyHai Doan mailto:doanduy...@gmail.com>> wrote: Purging data is also straightforward, just dropping SSTables (by a script) where create date is older than a threshold, we don't even need to rely on TTL -- Jon Haddad http://www.rustyrazorblade.com twitter: rustyrazorblade -- Akash
Re: Max number of windows when using TWCS
That’s right Jeff. That’s why I am thinking why not compaction gets rid of old exited sstables? Regards, Nitan Cell: 510 449 9629 > On Feb 11, 2019, at 3:53 PM, Jeff Jirsa wrote: > > It's probably not safe. You shouldn't touch the underlying sstables unless > you're very sure you know what you're doing. > > >> On Mon, Feb 11, 2019 at 1:05 PM Akash Gangil wrote: >> I have in the past tried to delete SSTables manually, but have noticed bits >> and pieces of that data still remain, even though the sstables of that >> window is deleted. So always wondered if playing directly with the >> underlying filesystem is a safe bet? >> >> >>> On Mon, Feb 11, 2019 at 1:01 PM Jonathan Haddad wrote: >>> Deleting SSTables manually can be useful if you don't know your TTL up >>> front. For example, you have an ETL process that moves your raw Cassandra >>> data into S3 as parquet files, and you want to be sure that process is >>> completed before you delete the data. You could also start out without >>> setting a TTL and later realize you need one. This is a remarkably common >>> problem. >>> On Mon, Feb 11, 2019 at 12:51 PM Nitan Kainth wrote: Jeff, It means we have to delete sstables manually? Regards, Nitan Cell: 510 449 9629 > On Feb 11, 2019, at 2:40 PM, Jeff Jirsa wrote: > > There's a bit of headache around overlapping sstables being strictly safe > to delete. https://issues.apache.org/jira/browse/CASSANDRA-13418 was > added to allow the "I know it's not technically safe, but just delete it > anyway" use case. For a lot of people who started using TWCS before > 13418, "stop cassandra, remove stuff we know is expired, start cassandra" > is a not-uncommon pattern in very high-write, high-disk-space use cases. > > > >> On Mon, Feb 11, 2019 at 12:34 PM Nitan Kainth >> wrote: >> Hi, >> In regards to comment “Purging data is also straightforward, just >> dropping SSTables (by a script) where create date is older than a >> threshold, we don't even need to rely on TTL” >> >> Doesn’t the old sstables drop by itself? One ttl and gc grace seconds >> past whole sstable will have only tombstones. >> >> >> Regards, >> Nitan >> Cell: 510 449 9629 >> >>> On Feb 11, 2019, at 2:23 PM, DuyHai Doan wrote: >>> >>> Purging data is also straightforward, just dropping SSTables (by a >>> script) where create date is older than a threshold, we don't even need >>> to rely on TTL >>> >>> >>> -- >>> Jon Haddad >>> http://www.rustyrazorblade.com >>> twitter: rustyrazorblade >> >> >> -- >> Akash
Re: Max number of windows when using TWCS
It's probably not safe. You shouldn't touch the underlying sstables unless you're very sure you know what you're doing. On Mon, Feb 11, 2019 at 1:05 PM Akash Gangil wrote: > I have in the past tried to delete SSTables manually, but have noticed > bits and pieces of that data still remain, even though the sstables of that > window is deleted. So always wondered if playing directly with the > underlying filesystem is a safe bet? > > > On Mon, Feb 11, 2019 at 1:01 PM Jonathan Haddad wrote: > >> Deleting SSTables manually can be useful if you don't know your TTL up >> front. For example, you have an ETL process that moves your raw Cassandra >> data into S3 as parquet files, and you want to be sure that process is >> completed before you delete the data. You could also start out without >> setting a TTL and later realize you need one. This is a remarkably common >> problem. >> >> On Mon, Feb 11, 2019 at 12:51 PM Nitan Kainth >> wrote: >> >>> Jeff, >>> >>> It means we have to delete sstables manually? >>> >>> >>> Regards, >>> >>> Nitan >>> >>> Cell: 510 449 9629 >>> >>> On Feb 11, 2019, at 2:40 PM, Jeff Jirsa wrote: >>> >>> There's a bit of headache around overlapping sstables being strictly >>> safe to delete. https://issues.apache.org/jira/browse/CASSANDRA-13418 >>> was added to allow the "I know it's not technically safe, but just delete >>> it anyway" use case. For a lot of people who started using TWCS before >>> 13418, "stop cassandra, remove stuff we know is expired, start cassandra" >>> is a not-uncommon pattern in very high-write, high-disk-space use cases. >>> >>> >>> >>> On Mon, Feb 11, 2019 at 12:34 PM Nitan Kainth >>> wrote: >>> Hi, In regards to comment “Purging data is also straightforward, just dropping SSTables (by a script) where create date is older than a threshold, we don't even need to rely on TTL” Doesn’t the old sstables drop by itself? One ttl and gc grace seconds past whole sstable will have only tombstones. Regards, Nitan Cell: 510 449 9629 On Feb 11, 2019, at 2:23 PM, DuyHai Doan wrote: Purging data is also straightforward, just dropping SSTables (by a script) where create date is older than a threshold, we don't even need to rely on TTL >> >> -- >> Jon Haddad >> http://www.rustyrazorblade.com >> twitter: rustyrazorblade >> > > > -- > Akash >
Re: Max number of windows when using TWCS
I have in the past tried to delete SSTables manually, but have noticed bits and pieces of that data still remain, even though the sstables of that window is deleted. So always wondered if playing directly with the underlying filesystem is a safe bet? On Mon, Feb 11, 2019 at 1:01 PM Jonathan Haddad wrote: > Deleting SSTables manually can be useful if you don't know your TTL up > front. For example, you have an ETL process that moves your raw Cassandra > data into S3 as parquet files, and you want to be sure that process is > completed before you delete the data. You could also start out without > setting a TTL and later realize you need one. This is a remarkably common > problem. > > On Mon, Feb 11, 2019 at 12:51 PM Nitan Kainth > wrote: > >> Jeff, >> >> It means we have to delete sstables manually? >> >> >> Regards, >> >> Nitan >> >> Cell: 510 449 9629 >> >> On Feb 11, 2019, at 2:40 PM, Jeff Jirsa wrote: >> >> There's a bit of headache around overlapping sstables being strictly safe >> to delete. https://issues.apache.org/jira/browse/CASSANDRA-13418 was >> added to allow the "I know it's not technically safe, but just delete it >> anyway" use case. For a lot of people who started using TWCS before 13418, >> "stop cassandra, remove stuff we know is expired, start cassandra" is a >> not-uncommon pattern in very high-write, high-disk-space use cases. >> >> >> >> On Mon, Feb 11, 2019 at 12:34 PM Nitan Kainth >> wrote: >> >>> Hi, >>> In regards to comment “Purging data is also straightforward, just >>> dropping SSTables (by a script) where create date is older than a >>> threshold, we don't even need to rely on TTL” >>> >>> Doesn’t the old sstables drop by itself? One ttl and gc grace seconds >>> past whole sstable will have only tombstones. >>> >>> >>> Regards, >>> >>> Nitan >>> >>> Cell: 510 449 9629 >>> >>> On Feb 11, 2019, at 2:23 PM, DuyHai Doan wrote: >>> >>> Purging data is also straightforward, just dropping SSTables (by a >>> script) where create date is older than a threshold, we don't even need to >>> rely on TTL >>> >>> > > -- > Jon Haddad > http://www.rustyrazorblade.com > twitter: rustyrazorblade > -- Akash
Re: Max number of windows when using TWCS
Deleting SSTables manually can be useful if you don't know your TTL up front. For example, you have an ETL process that moves your raw Cassandra data into S3 as parquet files, and you want to be sure that process is completed before you delete the data. You could also start out without setting a TTL and later realize you need one. This is a remarkably common problem. On Mon, Feb 11, 2019 at 12:51 PM Nitan Kainth wrote: > Jeff, > > It means we have to delete sstables manually? > > > Regards, > > Nitan > > Cell: 510 449 9629 > > On Feb 11, 2019, at 2:40 PM, Jeff Jirsa wrote: > > There's a bit of headache around overlapping sstables being strictly safe > to delete. https://issues.apache.org/jira/browse/CASSANDRA-13418 was > added to allow the "I know it's not technically safe, but just delete it > anyway" use case. For a lot of people who started using TWCS before 13418, > "stop cassandra, remove stuff we know is expired, start cassandra" is a > not-uncommon pattern in very high-write, high-disk-space use cases. > > > > On Mon, Feb 11, 2019 at 12:34 PM Nitan Kainth > wrote: > >> Hi, >> In regards to comment “Purging data is also straightforward, just >> dropping SSTables (by a script) where create date is older than a >> threshold, we don't even need to rely on TTL” >> >> Doesn’t the old sstables drop by itself? One ttl and gc grace seconds >> past whole sstable will have only tombstones. >> >> >> Regards, >> >> Nitan >> >> Cell: 510 449 9629 >> >> On Feb 11, 2019, at 2:23 PM, DuyHai Doan wrote: >> >> Purging data is also straightforward, just dropping SSTables (by a >> script) where create date is older than a threshold, we don't even need to >> rely on TTL >> >> -- Jon Haddad http://www.rustyrazorblade.com twitter: rustyrazorblade
Re: Max number of windows when using TWCS
Jeff, It means we have to delete sstables manually? Regards, Nitan Cell: 510 449 9629 > On Feb 11, 2019, at 2:40 PM, Jeff Jirsa wrote: > > There's a bit of headache around overlapping sstables being strictly safe to > delete. https://issues.apache.org/jira/browse/CASSANDRA-13418 was added to > allow the "I know it's not technically safe, but just delete it anyway" use > case. For a lot of people who started using TWCS before 13418, "stop > cassandra, remove stuff we know is expired, start cassandra" is a > not-uncommon pattern in very high-write, high-disk-space use cases. > > > >> On Mon, Feb 11, 2019 at 12:34 PM Nitan Kainth wrote: >> Hi, >> In regards to comment “Purging data is also straightforward, just dropping >> SSTables (by a script) where create date is older than a threshold, we don't >> even need to rely on TTL” >> >> Doesn’t the old sstables drop by itself? One ttl and gc grace seconds past >> whole sstable will have only tombstones. >> >> >> Regards, >> Nitan >> Cell: 510 449 9629 >> >>> On Feb 11, 2019, at 2:23 PM, DuyHai Doan wrote: >>> >>> Purging data is also straightforward, just dropping SSTables (by a script) >>> where create date is older than a threshold, we don't even need to rely on >>> TTL
Re: Max number of windows when using TWCS
thanks for the pointer Jeff On Mon, Feb 11, 2019 at 9:40 PM Jeff Jirsa wrote: > There's a bit of headache around overlapping sstables being strictly safe > to delete. https://issues.apache.org/jira/browse/CASSANDRA-13418 was > added to allow the "I know it's not technically safe, but just delete it > anyway" use case. For a lot of people who started using TWCS before 13418, > "stop cassandra, remove stuff we know is expired, start cassandra" is a > not-uncommon pattern in very high-write, high-disk-space use cases. > > > > On Mon, Feb 11, 2019 at 12:34 PM Nitan Kainth > wrote: > >> Hi, >> In regards to comment “Purging data is also straightforward, just >> dropping SSTables (by a script) where create date is older than a >> threshold, we don't even need to rely on TTL” >> >> Doesn’t the old sstables drop by itself? One ttl and gc grace seconds >> past whole sstable will have only tombstones. >> >> >> Regards, >> >> Nitan >> >> Cell: 510 449 9629 >> >> On Feb 11, 2019, at 2:23 PM, DuyHai Doan wrote: >> >> Purging data is also straightforward, just dropping SSTables (by a >> script) where create date is older than a threshold, we don't even need to >> rely on TTL >> >>
Re: Max number of windows when using TWCS
There's a bit of headache around overlapping sstables being strictly safe to delete. https://issues.apache.org/jira/browse/CASSANDRA-13418 was added to allow the "I know it's not technically safe, but just delete it anyway" use case. For a lot of people who started using TWCS before 13418, "stop cassandra, remove stuff we know is expired, start cassandra" is a not-uncommon pattern in very high-write, high-disk-space use cases. On Mon, Feb 11, 2019 at 12:34 PM Nitan Kainth wrote: > Hi, > In regards to comment “Purging data is also straightforward, just > dropping SSTables (by a script) where create date is older than a > threshold, we don't even need to rely on TTL” > > Doesn’t the old sstables drop by itself? One ttl and gc grace seconds past > whole sstable will have only tombstones. > > > Regards, > > Nitan > > Cell: 510 449 9629 > > On Feb 11, 2019, at 2:23 PM, DuyHai Doan wrote: > > Purging data is also straightforward, just dropping SSTables (by a script) > where create date is older than a threshold, we don't even need to rely on > TTL > >
Re: Max number of windows when using TWCS
Hi, In regards to comment “Purging data is also straightforward, just dropping SSTables (by a script) where create date is older than a threshold, we don't even need to rely on TTL” Doesn’t the old sstables drop by itself? One ttl and gc grace seconds past whole sstable will have only tombstones. Regards, Nitan Cell: 510 449 9629 > On Feb 11, 2019, at 2:23 PM, DuyHai Doan wrote: > > Purging data is also straightforward, just dropping SSTables (by a script) > where create date is older than a threshold, we don't even need to rely on TTL
Re: Max number of windows when using TWCS
No worry for overlapping, the use-case is about events/timeseries and there is almost no delay so it should be fine. On the note-side, since we have the guarantee to have 1 SSTable/day of ingestion, this is very easy to "emulate" incremental backup. You just need to find the generated SSTable with the latest create date and back it up every day at midnight with a script. Purging data is also straightforward, just dropping SSTables (by a script) where create date is older than a threshold, we don't even need to rely on TTL On Mon, Feb 11, 2019 at 9:19 PM Jeff Jirsa wrote: > Wild ass guess based on a large use case I knew about at the time > > If you go above that, I expect it’d largely be fine as long as you were > sure they weren’t overlapping so reads only ever touched a small subset of > the windows (ideally 1). > > If you have one day windows and every read touches all of the windows, > you’re going to have a bad time. > > -- > Jeff Jirsa > > > On Feb 11, 2019, at 12:12 PM, DuyHai Doan wrote: > > Hello users > > On the official documentation for TWCS ( > http://cassandra.apache.org/doc/latest/operating/compaction.html#time-window-compactionstrategy) > it is advised to select the windows unit and size so that the total number > of windows intervals is around 20-30. > > Is there any explanation for this range of 20-30 ? What if we exceed this > range, let's say having 1 day windows and keeping data for 1year, thus > having indeed 356 intervals ? What can go wrong with this ? > > Regards > > Duy Hai DOAN > >
Re: Max number of windows when using TWCS
Wild ass guess based on a large use case I knew about at the time If you go above that, I expect it’d largely be fine as long as you were sure they weren’t overlapping so reads only ever touched a small subset of the windows (ideally 1). If you have one day windows and every read touches all of the windows, you’re going to have a bad time. -- Jeff Jirsa > On Feb 11, 2019, at 12:12 PM, DuyHai Doan wrote: > > Hello users > > On the official documentation for TWCS > (http://cassandra.apache.org/doc/latest/operating/compaction.html#time-window-compactionstrategy) > it is advised to select the windows unit and size so that the total number > of windows intervals is around 20-30. > > Is there any explanation for this range of 20-30 ? What if we exceed this > range, let's say having 1 day windows and keeping data for 1year, thus having > indeed 356 intervals ? What can go wrong with this ? > > Regards > > Duy Hai DOAN
Max number of windows when using TWCS
Hello users On the official documentation for TWCS ( http://cassandra.apache.org/doc/latest/operating/compaction.html#time-window-compactionstrategy) it is advised to select the windows unit and size so that the total number of windows intervals is around 20-30. Is there any explanation for this range of 20-30 ? What if we exceed this range, let's say having 1 day windows and keeping data for 1year, thus having indeed 356 intervals ? What can go wrong with this ? Regards Duy Hai DOAN