RE: TWCS Log Warning
A typo, even though they are debug message I am interested in knowing their meaning since they happen very often. Thanks Jon for the video, that is helpful, our data is partitioned by other metrics and not time, its clustered by a timestamp, our biggest partition is ~8MB. Since we are using TWCS with 7 days window each week data should be in its own bucket sstable, dare I ask, is there a way to manually remove \‘archive’ very old time buckets at one point by removing those sstables or that could break things? From: Jon Haddad Sent: Thursday, May 23, 2024 5:43 PM To: user@cassandra.apache.org Cc: Bowen Song Subject: Re: TWCS Log Warning As an aside, if you're not putting a TTL on your data, it's a good idea to be proactive and use multiple tables. For example, one per month or year. This allows you the flexibility to delete your data by dropping old tables. Storing old data in Cassandra is expensive. Once you get to a certain point it becomes far more cost effective to offload your old data to an object store and keep your Cassandra cluster to a minimum size. I gave a talk on this topic on my YT channel: https://www.youtube.com/live/Ysfi3V2KQtU Jon On Thu, May 23, 2024 at 7:35 AM Bowen Song via user mailto:user@cassandra.apache.org>> wrote: As the log level name "DEBUG" suggested, these are debug messages, not warnings. Is there any reason made you believe that these messages are warnings? On 23/05/2024 11:10, Isaeed Mohanna wrote: Hi I have a big table (~220GB reported by used space live by tablestats) with time series data that uses TWCS with the following settings compaction = {'class': 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy', 'compaction_window_size': '7', 'compaction_window_unit': 'DAYS', 'max_threshold': '32', 'min_threshold': '4'} The table does not have a TTL configured since we need the data, it now has ~450 sstables, I have had this setup for several years and so far I am satisfied with the performance, we mostly read\write data from the previous several months. Requests for earlier data occur but not in the quantities and performance is less critical then. I have recently noticed reoccurring warning in the Cassandra log file and I wanted to ask about their meaning and wither I need to do something about it DEBUG [CompactionExecutor:356242] 2024-05-23 09:01:59,655 TimeWindowCompactionStrategy.java:129 - TWCS skipping check for fully expired SSTables DEBUG [CompactionExecutor:356242] 2024-05-23 09:01:59,655 TimeWindowCompactionStrategy.java:129 - TWCS skipping check for fully expired SSTables DEBUG [CompactionExecutor:356243] 2024-05-23 09:02:59,655 TimeWindowCompactionStrategy.java:122 - TWCS expired check sufficiently far in the past, checking for fully expired SSTables DEBUG [CompactionExecutor:356243] 2024-05-23 09:02:59,658 TimeWindowCompactionStrategy.java:122 - TWCS expired check sufficiently far in the past, checking for fully expired SSTables DEBUG [CompactionExecutor:356242] 2024-05-23 09:03:59,655 TimeWindowCompactionStrategy.java:129 - TWCS skipping check for fully expired SSTables DEBUG [CompactionExecutor:356242] 2024-05-23 09:03:59,656 TimeWindowCompactionStrategy.java:129 - TWCS skipping check for fully expired SSTables DEBUG [CompactionExecutor:356245] 2024-05-23 09:05:00,490 TimeWindowCompactionStrategy.java:129 - TWCS skipping check for fully expired SSTables DEBUG [CompactionExecutor:356245] 2024-05-23 09:05:00,490 TimeWindowCompactionStrategy.java:129 - TWCS skipping check for fully expired SSTables DEBUG [CompactionExecutor:356244] 2024-05-23 09:06:00,490 TimeWindowCompactionStrategy.java:129 - TWCS skipping check for fully expired SSTables The debug messages above appear in one of my Cassandra nodes every several minutes, I have a 4 node cluster with RF=3. Is there anything I need to do about those messages or its safe to ignore them Thank you for the help
TWCS Log Warning
Hi I have a big table (~220GB reported by used space live by tablestats) with time series data that uses TWCS with the following settings compaction = {'class': 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy', 'compaction_window_size': '7', 'compaction_window_unit': 'DAYS', 'max_threshold': '32', 'min_threshold': '4'} The table does not have a TTL configured since we need the data, it now has ~450 sstables, I have had this setup for several years and so far I am satisfied with the performance, we mostly read\write data from the previous several months. Requests for earlier data occur but not in the quantities and performance is less critical then. I have recently noticed reoccurring warning in the Cassandra log file and I wanted to ask about their meaning and wither I need to do something about it DEBUG [CompactionExecutor:356242] 2024-05-23 09:01:59,655 TimeWindowCompactionStrategy.java:129 - TWCS skipping check for fully expired SSTables DEBUG [CompactionExecutor:356242] 2024-05-23 09:01:59,655 TimeWindowCompactionStrategy.java:129 - TWCS skipping check for fully expired SSTables DEBUG [CompactionExecutor:356243] 2024-05-23 09:02:59,655 TimeWindowCompactionStrategy.java:122 - TWCS expired check sufficiently far in the past, checking for fully expired SSTables DEBUG [CompactionExecutor:356243] 2024-05-23 09:02:59,658 TimeWindowCompactionStrategy.java:122 - TWCS expired check sufficiently far in the past, checking for fully expired SSTables DEBUG [CompactionExecutor:356242] 2024-05-23 09:03:59,655 TimeWindowCompactionStrategy.java:129 - TWCS skipping check for fully expired SSTables DEBUG [CompactionExecutor:356242] 2024-05-23 09:03:59,656 TimeWindowCompactionStrategy.java:129 - TWCS skipping check for fully expired SSTables DEBUG [CompactionExecutor:356245] 2024-05-23 09:05:00,490 TimeWindowCompactionStrategy.java:129 - TWCS skipping check for fully expired SSTables DEBUG [CompactionExecutor:356245] 2024-05-23 09:05:00,490 TimeWindowCompactionStrategy.java:129 - TWCS skipping check for fully expired SSTables DEBUG [CompactionExecutor:356244] 2024-05-23 09:06:00,490 TimeWindowCompactionStrategy.java:129 - TWCS skipping check for fully expired SSTables The debug messages above appear in one of my Cassandra nodes every several minutes, I have a 4 node cluster with RF=3. Is there anything I need to do about those messages or its safe to ignore them Thank you for the help
RE: Trouble After Changing Replication Factor
Hi again I did run repair -full without any parameters which I understood will run repair for all key spaces, but I do not recall seeing validation tasks running on one of my two main keyspaces with most data. Maybe it failed or didn’t run. Anyhow I tested with a small app on a small table that I have, the app would fail before the repair, and after running repair -full on the specific table it running fine, so I am running a full repair on the problematic keyspace , hopefully all will be fine when repair is done. I am left wondering though, why does Cassandra allow this to happen, most other operations are somewhat guarded, one would expect the RF change operation will not complete without having the actual changes been carried out, I got surprised that CL1 reads are failing and it could cause serious data inconsistences, but maybe that is not realistic in large datasets to wait for the changes but I think it should be added to the documentation to warn that read with CL1 will fail until a full repair is completed. Thanks everyone for the help, Isaeed Mohanna From: Jeff Jirsa Sent: Tuesday, October 12, 2021 4:59 PM To: cassandra Subject: Re: Trouble After Changing Replication Factor The most likely explanation is that repair failed and you didnt notice. Or that you didnt actually repair every host / every range. Which version are you using? How did you run repair? On Tue, Oct 12, 2021 at 4:33 AM Isaeed Mohanna mailto:isa...@xsense.co>> wrote: Hi Yes I am sacrificing consistency to gain higher availability and faster speed, but my problem is not with newly inserted data that is not there for a very short period of time, my problem is the data that was there before the RF change, still do not exist in all replicas even after repair. It looks like my cluster configuration is RF3 but the data itself is still using RF2 and when the data is requested from the 3rd (new) replica, it is not there and an empty record is returned with read CL1. What can I do to force this data to be synced to all replicas as it should? So read CL1 request will actually return a correct result? Thanks From: Bowen Song mailto:bo...@bso.ng>> Sent: Monday, October 11, 2021 5:13 PM To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> Subject: Re: Trouble After Changing Replication Factor You have RF=3 and both read & write CL=1, which means you are asking Cassandra to give up strong consistency in order to gain higher availability and perhaps slight faster speed, and that's what you get. If you want to have strong consistency, you will need to make sure (read CL + write CL) > RF. On 10/10/2021 11:55, Isaeed Mohanna wrote: Hi We had a cluster with 3 Nodes with Replication Factor 2 and we were using read with consistency Level One. We recently added a 4th node and changed the replication factor to 3, once this was done apps reading from DB with CL1 would receive an empty record, Looking around I was surprised to learn that upon changing the replication factor if the read request is sent to a node the should own the record according to the new replication factor while it still doesn’t have it yet then an empty record will be returned because of CL1, the record will be written to that node after the repair operation is over. We ran the repair operation which took days in our case (we had to change apps to CL2 to avoid serious data inconsistencies). Now the repair operations are over and if I revert to CL1 we are still getting errors that records do not exist in DB while they do, using CL2 again it works fine. Any ideas what I am missing? Is there a way to validate that the repairs task has actually done what is needed and that the data is actually now replicated RF3 ? Could it it be a Cassandra Driver issue? Since if I issue the request in cqlsh I do get the record but I cannot know if I am hitting the replica that doesn’t hold the record Thanks for your help
RE: Trouble After Changing Replication Factor
Hi Yes I am sacrificing consistency to gain higher availability and faster speed, but my problem is not with newly inserted data that is not there for a very short period of time, my problem is the data that was there before the RF change, still do not exist in all replicas even after repair. It looks like my cluster configuration is RF3 but the data itself is still using RF2 and when the data is requested from the 3rd (new) replica, it is not there and an empty record is returned with read CL1. What can I do to force this data to be synced to all replicas as it should? So read CL1 request will actually return a correct result? Thanks From: Bowen Song Sent: Monday, October 11, 2021 5:13 PM To: user@cassandra.apache.org Subject: Re: Trouble After Changing Replication Factor You have RF=3 and both read & write CL=1, which means you are asking Cassandra to give up strong consistency in order to gain higher availability and perhaps slight faster speed, and that's what you get. If you want to have strong consistency, you will need to make sure (read CL + write CL) > RF. On 10/10/2021 11:55, Isaeed Mohanna wrote: Hi We had a cluster with 3 Nodes with Replication Factor 2 and we were using read with consistency Level One. We recently added a 4th node and changed the replication factor to 3, once this was done apps reading from DB with CL1 would receive an empty record, Looking around I was surprised to learn that upon changing the replication factor if the read request is sent to a node the should own the record according to the new replication factor while it still doesn’t have it yet then an empty record will be returned because of CL1, the record will be written to that node after the repair operation is over. We ran the repair operation which took days in our case (we had to change apps to CL2 to avoid serious data inconsistencies). Now the repair operations are over and if I revert to CL1 we are still getting errors that records do not exist in DB while they do, using CL2 again it works fine. Any ideas what I am missing? Is there a way to validate that the repairs task has actually done what is needed and that the data is actually now replicated RF3 ? Could it it be a Cassandra Driver issue? Since if I issue the request in cqlsh I do get the record but I cannot know if I am hitting the replica that doesn’t hold the record Thanks for your help
Trouble After Changing Replication Factor
Hi We had a cluster with 3 Nodes with Replication Factor 2 and we were using read with consistency Level One. We recently added a 4th node and changed the replication factor to 3, once this was done apps reading from DB with CL1 would receive an empty record, Looking around I was surprised to learn that upon changing the replication factor if the read request is sent to a node the should own the record according to the new replication factor while it still doesn't have it yet then an empty record will be returned because of CL1, the record will be written to that node after the repair operation is over. We ran the repair operation which took days in our case (we had to change apps to CL2 to avoid serious data inconsistencies). Now the repair operations are over and if I revert to CL1 we are still getting errors that records do not exist in DB while they do, using CL2 again it works fine. Any ideas what I am missing? Is there a way to validate that the repairs task has actually done what is needed and that the data is actually now replicated RF3 ? Could it it be a Cassandra Driver issue? Since if I issue the request in cqlsh I do get the record but I cannot know if I am hitting the replica that doesn't hold the record Thanks for your help
RE: TWCS on Non TTL Data
The point is that I am NOT using TTL and I need to keep the data, so when I do the switch to TWCS, will the old files be recompacted or they will remain the same and only new data coming in will use TWCS? From: Bowen Song Sent: Friday, September 17, 2021 9:04 PM To: user@cassandra.apache.org Subject: Re: TWCS on Non TTL Data If you use TWCS with TTL, the old SSTables won't be compacted, the entire SSTable file will get dropped after it expires. I don't think you will need to manage the compaction or cleanup at all, as they are automatic. There's no space limit on the table holding the near-term data other than the overall free disk space. There's only a time limit on that table. On 17/09/2021 16:51, Isaeed Mohanna wrote: Thank for the help, How does the compaction run? Does it clean old compaction files while running or only at the end, I want to manage the free space so not run out while its running? From: Jim Shaw <mailto:jxys...@gmail.com> Sent: Wednesday, September 15, 2021 3:49 PM To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> Subject: Re: TWCS on Non TTL Data You may try roll up the data, i.e. a table only 1 month data, old data roll up to a table keep a year data. Thanks, Jim On Wed, Sep 15, 2021 at 1:26 AM Isaeed Mohanna mailto:isa...@xsense.co>> wrote: My cluster column is the time series timestamp, so basically sourceId, metric type for partition key and timestamp for the clustering key the rest of the fields are just values outside of the primary key. Our reads request are simply give me values for a time range of a specific sourceId,Metric combination. So I am guess that during read the sstables that contain the partition key will be found and out of those the ones that are out of the range will be excluded, correct? In practice our queries are up to a month by default, only rarely we fetch more when someone is exporting the data or so. In reality also we get old data, that is a source will send its information late instead of sending it in realtime it will send all last month\week\day data at once, in that case I guess the data will end up in current bucket, will that affect performance? Assuming I start with a 1 week bucket, I could later change the time window right? Thanks From: Jeff Jirsa mailto:jji...@gmail.com>> Sent: Tuesday, September 14, 2021 10:35 PM To: cassandra mailto:user@cassandra.apache.org>> Subject: Re: TWCS on Non TTL Data Inline On Tue, Sep 14, 2021 at 11:47 AM Isaeed Mohanna mailto:isa...@xsense.co>> wrote: Hi Jeff My data is partitioned by a sourceId and metric, a source is usually active up to a year after which there is no additional writes for the partition, and reads become scarce, so although this is not an explicit time component, its time based, will that suffice? I guess it means that a single read may touch a year of sstables. Not great, but perhaps not fatal. Hopefully your reads avoid that in practice. We'd need the full schema to be very sure (does clustering column include month/day? if so, there are cases where that can help exclude sstables) If I use a week bucket we will be able to serve last few days reads from one file and last month from ~5 which is the most common queries, do u think doing a months bucket a good idea? That will allow reading from one file most of the time but the size of each SSTable will be ~5 times bigger It'll be 1-4 for most common (up to 4 for same bucket reads because STCS in the first bucket is triggered at min_threshold=4), and 5 max, seems reasonable. Way better than the 200 or so you're doing now. When changing the compaction strategy via JMX, do I need to issue the alter table command at the end so it will be reflected in the schema or is it taking care of automatically? (I am using cassandra 3.11.11) At the end, yes. Thanks a lot for your help. From: Jeff Jirsa mailto:jji...@gmail.com>> Sent: Tuesday, September 14, 2021 4:51 PM To: cassandra mailto:user@cassandra.apache.org>> Subject: Re: TWCS on Non TTL Data On Tue, Sep 14, 2021 at 5:42 AM Isaeed Mohanna mailto:isa...@xsense.co>> wrote: Hi I have a table that stores time series data, the data is not TTLed since we want to retain the data for the foreseeable future, and there are no updates or deletes. (deletes could happens rarely in case some scrambled data reached the table, but its extremely rare). Usually we do constant write of incoming data to the table ~ 5 milion a day, mostly newly generated data in the past week, but we also get old data that got stuck somewhere but not that often. Usually our reads are for the most recent data last month – three. But we do fetch old data as well in a specific time period in the past. Lately we have been facing performance trouble with this table see histogram below, When compaction is working on the table the performance even drops to 10-20 seconds!! Percentile SSTables
RE: TWCS on Non TTL Data
Thank for the help, How does the compaction run? Does it clean old compaction files while running or only at the end, I want to manage the free space so not run out while its running? From: Jim Shaw Sent: Wednesday, September 15, 2021 3:49 PM To: user@cassandra.apache.org Subject: Re: TWCS on Non TTL Data You may try roll up the data, i.e. a table only 1 month data, old data roll up to a table keep a year data. Thanks, Jim On Wed, Sep 15, 2021 at 1:26 AM Isaeed Mohanna mailto:isa...@xsense.co>> wrote: My cluster column is the time series timestamp, so basically sourceId, metric type for partition key and timestamp for the clustering key the rest of the fields are just values outside of the primary key. Our reads request are simply give me values for a time range of a specific sourceId,Metric combination. So I am guess that during read the sstables that contain the partition key will be found and out of those the ones that are out of the range will be excluded, correct? In practice our queries are up to a month by default, only rarely we fetch more when someone is exporting the data or so. In reality also we get old data, that is a source will send its information late instead of sending it in realtime it will send all last month\week\day data at once, in that case I guess the data will end up in current bucket, will that affect performance? Assuming I start with a 1 week bucket, I could later change the time window right? Thanks From: Jeff Jirsa mailto:jji...@gmail.com>> Sent: Tuesday, September 14, 2021 10:35 PM To: cassandra mailto:user@cassandra.apache.org>> Subject: Re: TWCS on Non TTL Data Inline On Tue, Sep 14, 2021 at 11:47 AM Isaeed Mohanna mailto:isa...@xsense.co>> wrote: Hi Jeff My data is partitioned by a sourceId and metric, a source is usually active up to a year after which there is no additional writes for the partition, and reads become scarce, so although this is not an explicit time component, its time based, will that suffice? I guess it means that a single read may touch a year of sstables. Not great, but perhaps not fatal. Hopefully your reads avoid that in practice. We'd need the full schema to be very sure (does clustering column include month/day? if so, there are cases where that can help exclude sstables) If I use a week bucket we will be able to serve last few days reads from one file and last month from ~5 which is the most common queries, do u think doing a months bucket a good idea? That will allow reading from one file most of the time but the size of each SSTable will be ~5 times bigger It'll be 1-4 for most common (up to 4 for same bucket reads because STCS in the first bucket is triggered at min_threshold=4), and 5 max, seems reasonable. Way better than the 200 or so you're doing now. When changing the compaction strategy via JMX, do I need to issue the alter table command at the end so it will be reflected in the schema or is it taking care of automatically? (I am using cassandra 3.11.11) At the end, yes. Thanks a lot for your help. From: Jeff Jirsa mailto:jji...@gmail.com>> Sent: Tuesday, September 14, 2021 4:51 PM To: cassandra mailto:user@cassandra.apache.org>> Subject: Re: TWCS on Non TTL Data On Tue, Sep 14, 2021 at 5:42 AM Isaeed Mohanna mailto:isa...@xsense.co>> wrote: Hi I have a table that stores time series data, the data is not TTLed since we want to retain the data for the foreseeable future, and there are no updates or deletes. (deletes could happens rarely in case some scrambled data reached the table, but its extremely rare). Usually we do constant write of incoming data to the table ~ 5 milion a day, mostly newly generated data in the past week, but we also get old data that got stuck somewhere but not that often. Usually our reads are for the most recent data last month – three. But we do fetch old data as well in a specific time period in the past. Lately we have been facing performance trouble with this table see histogram below, When compaction is working on the table the performance even drops to 10-20 seconds!! Percentile SSTables Write Latency Read LatencyPartition Size Cell Count (micros) (micros) (bytes) 50% 215.00 17.08 89970.66 1916 149 75% 446.00 24.60 223875.79 2759 215 95% 535.00 35.43 464228.84 8239 642 98% 642.00 51.01 668489.53 24601 1916 99% 642.00 73.46 962624.93 42510 3311 Min 0.00 2.30 10090.8143 0 Max 770.00 1358.102395318.86 5839588 454826 As u c
RE: TWCS on Non TTL Data
My cluster column is the time series timestamp, so basically sourceId, metric type for partition key and timestamp for the clustering key the rest of the fields are just values outside of the primary key. Our reads request are simply give me values for a time range of a specific sourceId,Metric combination. So I am guess that during read the sstables that contain the partition key will be found and out of those the ones that are out of the range will be excluded, correct? In practice our queries are up to a month by default, only rarely we fetch more when someone is exporting the data or so. In reality also we get old data, that is a source will send its information late instead of sending it in realtime it will send all last month\week\day data at once, in that case I guess the data will end up in current bucket, will that affect performance? Assuming I start with a 1 week bucket, I could later change the time window right? Thanks From: Jeff Jirsa Sent: Tuesday, September 14, 2021 10:35 PM To: cassandra Subject: Re: TWCS on Non TTL Data Inline On Tue, Sep 14, 2021 at 11:47 AM Isaeed Mohanna mailto:isa...@xsense.co>> wrote: Hi Jeff My data is partitioned by a sourceId and metric, a source is usually active up to a year after which there is no additional writes for the partition, and reads become scarce, so although this is not an explicit time component, its time based, will that suffice? I guess it means that a single read may touch a year of sstables. Not great, but perhaps not fatal. Hopefully your reads avoid that in practice. We'd need the full schema to be very sure (does clustering column include month/day? if so, there are cases where that can help exclude sstables) If I use a week bucket we will be able to serve last few days reads from one file and last month from ~5 which is the most common queries, do u think doing a months bucket a good idea? That will allow reading from one file most of the time but the size of each SSTable will be ~5 times bigger It'll be 1-4 for most common (up to 4 for same bucket reads because STCS in the first bucket is triggered at min_threshold=4), and 5 max, seems reasonable. Way better than the 200 or so you're doing now. When changing the compaction strategy via JMX, do I need to issue the alter table command at the end so it will be reflected in the schema or is it taking care of automatically? (I am using cassandra 3.11.11) At the end, yes. Thanks a lot for your help. From: Jeff Jirsa mailto:jji...@gmail.com>> Sent: Tuesday, September 14, 2021 4:51 PM To: cassandra mailto:user@cassandra.apache.org>> Subject: Re: TWCS on Non TTL Data On Tue, Sep 14, 2021 at 5:42 AM Isaeed Mohanna mailto:isa...@xsense.co>> wrote: Hi I have a table that stores time series data, the data is not TTLed since we want to retain the data for the foreseeable future, and there are no updates or deletes. (deletes could happens rarely in case some scrambled data reached the table, but its extremely rare). Usually we do constant write of incoming data to the table ~ 5 milion a day, mostly newly generated data in the past week, but we also get old data that got stuck somewhere but not that often. Usually our reads are for the most recent data last month – three. But we do fetch old data as well in a specific time period in the past. Lately we have been facing performance trouble with this table see histogram below, When compaction is working on the table the performance even drops to 10-20 seconds!! Percentile SSTables Write Latency Read LatencyPartition Size Cell Count (micros) (micros) (bytes) 50% 215.00 17.08 89970.66 1916 149 75% 446.00 24.60 223875.79 2759 215 95% 535.00 35.43 464228.84 8239 642 98% 642.00 51.01 668489.53 24601 1916 99% 642.00 73.46 962624.93 42510 3311 Min 0.00 2.30 10090.8143 0 Max 770.00 1358.102395318.86 5839588 454826 As u can see we are scaning hundreds of sstables, turns out we are using DTCS (min:4,max32) , the table folder contains ~33K files of ~130GB per node (cleanup pending after increasing the cluster), And compaction takes a very long time to complete. As I understood DTCS is deprecated so my questions 1. should we switch to TWCS even though our data is not TTLed since we do not do delete at all can we still use it? Will it improve performance? It will probably be better than DTCS here, but you'll still have potentially lots of sstables over time. Lots of sstables in itself isn't a big deal
RE: TWCS on Non TTL Data
Hi Jeff My data is partitioned by a sourceId and metric, a source is usually active up to a year after which there is no additional writes for the partition, and reads become scarce, so although this is not an explicit time component, its time based, will that suffice? If I use a week bucket we will be able to serve last few days reads from one file and last month from ~5 which is the most common queries, do u think doing a months bucket a good idea? That will allow reading from one file most of the time but the size of each SSTable will be ~5 times bigger When changing the compaction strategy via JMX, do I need to issue the alter table command at the end so it will be reflected in the schema or is it taking care of automatically? (I am using cassandra 3.11.11) Thanks a lot for your help. From: Jeff Jirsa Sent: Tuesday, September 14, 2021 4:51 PM To: cassandra Subject: Re: TWCS on Non TTL Data On Tue, Sep 14, 2021 at 5:42 AM Isaeed Mohanna mailto:isa...@xsense.co>> wrote: Hi I have a table that stores time series data, the data is not TTLed since we want to retain the data for the foreseeable future, and there are no updates or deletes. (deletes could happens rarely in case some scrambled data reached the table, but its extremely rare). Usually we do constant write of incoming data to the table ~ 5 milion a day, mostly newly generated data in the past week, but we also get old data that got stuck somewhere but not that often. Usually our reads are for the most recent data last month – three. But we do fetch old data as well in a specific time period in the past. Lately we have been facing performance trouble with this table see histogram below, When compaction is working on the table the performance even drops to 10-20 seconds!! Percentile SSTables Write Latency Read LatencyPartition Size Cell Count (micros) (micros) (bytes) 50% 215.00 17.08 89970.66 1916 149 75% 446.00 24.60 223875.79 2759 215 95% 535.00 35.43 464228.84 8239 642 98% 642.00 51.01 668489.53 24601 1916 99% 642.00 73.46 962624.93 42510 3311 Min 0.00 2.30 10090.8143 0 Max 770.00 1358.102395318.86 5839588 454826 As u can see we are scaning hundreds of sstables, turns out we are using DTCS (min:4,max32) , the table folder contains ~33K files of ~130GB per node (cleanup pending after increasing the cluster), And compaction takes a very long time to complete. As I understood DTCS is deprecated so my questions 1. should we switch to TWCS even though our data is not TTLed since we do not do delete at all can we still use it? Will it improve performance? It will probably be better than DTCS here, but you'll still have potentially lots of sstables over time. Lots of sstables in itself isn't a big deal, the problem comes from scanning more than a handful on each read. Does your table have some form of date bucketing to avoid touching old data files? 1. If we should switch I am thinking of using a time window of a week, this way the read will scan 10s of sstables instead of hundreds today. Does it sound reasonable? 10s is better than hundreds, but it's still a lot. 1. Is there a recommended size of a window bucket in terms of disk space? When I wrote it, I wrote it for a use case that had 30 windows over the whole set of data. Since then, I've seen it used with anywhere from 5 to 60 buckets. With no TTL, you're effectively doing infinite buckets. So the only way to ensure you're not touching too many sstables is to put the date (in some form) into the partition key and let the database use that (+bloom filters) to avoid reading too many sstables. 1. If TWCS is not a good idea should I switch to STCS instead could that yield in better performance than current situation? LCS will give you better read performance. STCS will probably be better than DTCS given the 215 sstable p50 you're seeing (which is crazy btw, I'm surprised you're not just OOMing) 1. What are the risk of changing compaction strategy on a production system, can it be done on the fly? Or its better to go through a full test, backup cycle? The risk is you trigger a ton of compactions which drops the performance of the whole system all at once and your front door queries all time out. You can approach this a few ways: - Use the JMX endpoint to change compaction on one instance at a time (rather than doing it in the schema), which lets you control how many nodes are re-writing all their data at any given point in time - You ca
TWCS on Non TTL Data
Hi I have a table that stores time series data, the data is not TTLed since we want to retain the data for the foreseeable future, and there are no updates or deletes. (deletes could happens rarely in case some scrambled data reached the table, but its extremely rare). Usually we do constant write of incoming data to the table ~ 5 milion a day, mostly newly generated data in the past week, but we also get old data that got stuck somewhere but not that often. Usually our reads are for the most recent data last month - three. But we do fetch old data as well in a specific time period in the past. Lately we have been facing performance trouble with this table see histogram below, When compaction is working on the table the performance even drops to 10-20 seconds!! Percentile SSTables Write Latency Read LatencyPartition Size Cell Count (micros) (micros) (bytes) 50% 215.00 17.08 89970.66 1916 149 75% 446.00 24.60 223875.79 2759 215 95% 535.00 35.43 464228.84 8239 642 98% 642.00 51.01 668489.53 24601 1916 99% 642.00 73.46 962624.93 42510 3311 Min 0.00 2.30 10090.8143 0 Max 770.00 1358.102395318.86 5839588 454826 As u can see we are scaning hundreds of sstables, turns out we are using DTCS (min:4,max32) , the table folder contains ~33K files of ~130GB per node (cleanup pending after increasing the cluster), And compaction takes a very long time to complete. As I understood DTCS is deprecated so my questions 1. should we switch to TWCS even though our data is not TTLed since we do not do delete at all can we still use it? Will it improve performance? 2. If we should switch I am thinking of using a time window of a week, this way the read will scan 10s of sstables instead of hundreds today. Does it sound reasonable? 3. Is there a recommended size of a window bucket in terms of disk space? 4. If TWCS is not a good idea should I switch to STCS instead could that yield in better performance than current situation? 5. What are the risk of changing compaction strategy on a production system, can it be done on the fly? Or its better to go through a full test, backup cycle? All input will be appreciated, Thank you