RE: TWCS Log Warning

2024-05-27 Thread Isaeed Mohanna
A typo, even though they are debug message I am interested in knowing their 
meaning since they happen very often.

Thanks Jon for the video, that is helpful, our data is partitioned by other 
metrics and not time, its clustered by a timestamp, our biggest partition is 
~8MB.

Since we are using TWCS with 7 days window each week data should be in its own 
bucket sstable, dare I ask, is there a  way to manually remove \‘archive’ very 
old time buckets at one point by removing those sstables or that could break 
things?

From: Jon Haddad 
Sent: Thursday, May 23, 2024 5:43 PM
To: user@cassandra.apache.org
Cc: Bowen Song 
Subject: Re: TWCS Log Warning

As an aside, if you're not putting a TTL on your data, it's a good idea to be 
proactive and use multiple tables.  For example, one per month or year.  This 
allows you the flexibility to delete your data by dropping old tables.

Storing old data in Cassandra is expensive.  Once you get to a certain point it 
becomes far more cost effective to offload your old data to an object store and 
keep your Cassandra cluster to a minimum size.

I gave a talk on this topic on my YT channel: 
https://www.youtube.com/live/Ysfi3V2KQtU

Jon


On Thu, May 23, 2024 at 7:35 AM Bowen Song via user 
mailto:user@cassandra.apache.org>> wrote:

As the log level name "DEBUG" suggested, these are debug messages, not warnings.

Is there any reason made you believe that these messages are warnings?


On 23/05/2024 11:10, Isaeed Mohanna wrote:
Hi
I have a big table (~220GB reported by used space live by tablestats) with time 
series data that uses TWCS with the following settings
compaction = {'class': 
'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy', 
'compaction_window_size': '7', 'compaction_window_unit': 'DAYS', 
'max_threshold': '32', 'min_threshold': '4'}
The table does not have a TTL configured since we need the data, it now has 
~450 sstables, I have had this setup for several years and so far I am 
satisfied with the performance, we mostly read\write data from the previous 
several months. Requests for earlier data occur but not in the quantities and 
performance is less critical then.
I have recently noticed reoccurring warning in the Cassandra log file and I 
wanted to ask about their meaning and wither I need to do something about it
DEBUG [CompactionExecutor:356242] 2024-05-23 09:01:59,655 
TimeWindowCompactionStrategy.java:129 - TWCS skipping check for fully expired 
SSTables
DEBUG [CompactionExecutor:356242] 2024-05-23 09:01:59,655 
TimeWindowCompactionStrategy.java:129 - TWCS skipping check for fully expired 
SSTables
DEBUG [CompactionExecutor:356243] 2024-05-23 09:02:59,655 
TimeWindowCompactionStrategy.java:122 - TWCS expired check sufficiently far in 
the past, checking for fully expired SSTables
DEBUG [CompactionExecutor:356243] 2024-05-23 09:02:59,658 
TimeWindowCompactionStrategy.java:122 - TWCS expired check sufficiently far in 
the past, checking for fully expired SSTables
DEBUG [CompactionExecutor:356242] 2024-05-23 09:03:59,655 
TimeWindowCompactionStrategy.java:129 - TWCS skipping check for fully expired 
SSTables
DEBUG [CompactionExecutor:356242] 2024-05-23 09:03:59,656 
TimeWindowCompactionStrategy.java:129 - TWCS skipping check for fully expired 
SSTables
DEBUG [CompactionExecutor:356245] 2024-05-23 09:05:00,490 
TimeWindowCompactionStrategy.java:129 - TWCS skipping check for fully expired 
SSTables
DEBUG [CompactionExecutor:356245] 2024-05-23 09:05:00,490 
TimeWindowCompactionStrategy.java:129 - TWCS skipping check for fully expired 
SSTables
DEBUG [CompactionExecutor:356244] 2024-05-23 09:06:00,490 
TimeWindowCompactionStrategy.java:129 - TWCS skipping check for fully expired 
SSTables

The debug messages above appear in one of my Cassandra nodes every several 
minutes, I have a 4 node cluster with RF=3.
Is there anything I need to do about those messages or its safe to ignore them
Thank you for the help


TWCS Log Warning

2024-05-23 Thread Isaeed Mohanna
Hi
I have a big table (~220GB reported by used space live by tablestats) with time 
series data that uses TWCS with the following settings
compaction = {'class': 
'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy', 
'compaction_window_size': '7', 'compaction_window_unit': 'DAYS', 
'max_threshold': '32', 'min_threshold': '4'}
The table does not have a TTL configured since we need the data, it now has 
~450 sstables, I have had this setup for several years and so far I am 
satisfied with the performance, we mostly read\write data from the previous 
several months. Requests for earlier data occur but not in the quantities and 
performance is less critical then.
I have recently noticed reoccurring warning in the Cassandra log file and I 
wanted to ask about their meaning and wither I need to do something about it
DEBUG [CompactionExecutor:356242] 2024-05-23 09:01:59,655 
TimeWindowCompactionStrategy.java:129 - TWCS skipping check for fully expired 
SSTables
DEBUG [CompactionExecutor:356242] 2024-05-23 09:01:59,655 
TimeWindowCompactionStrategy.java:129 - TWCS skipping check for fully expired 
SSTables
DEBUG [CompactionExecutor:356243] 2024-05-23 09:02:59,655 
TimeWindowCompactionStrategy.java:122 - TWCS expired check sufficiently far in 
the past, checking for fully expired SSTables
DEBUG [CompactionExecutor:356243] 2024-05-23 09:02:59,658 
TimeWindowCompactionStrategy.java:122 - TWCS expired check sufficiently far in 
the past, checking for fully expired SSTables
DEBUG [CompactionExecutor:356242] 2024-05-23 09:03:59,655 
TimeWindowCompactionStrategy.java:129 - TWCS skipping check for fully expired 
SSTables
DEBUG [CompactionExecutor:356242] 2024-05-23 09:03:59,656 
TimeWindowCompactionStrategy.java:129 - TWCS skipping check for fully expired 
SSTables
DEBUG [CompactionExecutor:356245] 2024-05-23 09:05:00,490 
TimeWindowCompactionStrategy.java:129 - TWCS skipping check for fully expired 
SSTables
DEBUG [CompactionExecutor:356245] 2024-05-23 09:05:00,490 
TimeWindowCompactionStrategy.java:129 - TWCS skipping check for fully expired 
SSTables
DEBUG [CompactionExecutor:356244] 2024-05-23 09:06:00,490 
TimeWindowCompactionStrategy.java:129 - TWCS skipping check for fully expired 
SSTables

The debug messages above appear in one of my Cassandra nodes every several 
minutes, I have a 4 node cluster with RF=3.
Is there anything I need to do about those messages or its safe to ignore them
Thank you for the help


RE: Trouble After Changing Replication Factor

2021-10-13 Thread Isaeed Mohanna
Hi again
I did run repair -full without any parameters which I understood will run 
repair for all key spaces, but I do not recall seeing validation tasks running 
on one of my two main keyspaces with most data. Maybe it failed or didn’t run.
Anyhow I tested with a small app on a small table that I have, the app would 
fail before the repair, and after running repair -full on the specific table it 
running fine, so I am running a full repair on the problematic keyspace , 
hopefully all will be fine when repair is done.
I am left wondering though, why does Cassandra allow this to happen, most other 
operations are somewhat guarded, one would expect the RF change operation will 
not complete without having the actual changes been carried out, I got 
surprised that CL1 reads are failing and it could cause serious data 
inconsistences, but maybe that is not realistic in large datasets to wait for 
the changes but I think it should be added to the documentation to warn that 
read with CL1 will fail until a full repair is completed.
Thanks everyone for the help,
Isaeed Mohanna


From: Jeff Jirsa 
Sent: Tuesday, October 12, 2021 4:59 PM
To: cassandra 
Subject: Re: Trouble After Changing Replication Factor

The most likely explanation is that repair failed and you didnt notice.
Or that you didnt actually repair every host / every range.

Which version are you using?
How did you run repair?


On Tue, Oct 12, 2021 at 4:33 AM Isaeed Mohanna 
mailto:isa...@xsense.co>> wrote:
Hi
Yes I am sacrificing consistency to gain higher availability and faster speed, 
but my problem is not with newly inserted data that is not there for a very 
short period of time, my problem is the data that was there before the RF 
change, still do not exist in all replicas even after repair.
It looks like my cluster configuration is RF3 but the data itself is still 
using RF2 and when the data is requested from the 3rd (new) replica, it is not 
there and an empty record is returned with read CL1.
What can I do to force this data to be synced to all replicas as it should? So 
read CL1 request will actually return a correct result?

Thanks

From: Bowen Song mailto:bo...@bso.ng>>
Sent: Monday, October 11, 2021 5:13 PM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: Trouble After Changing Replication Factor


You have RF=3 and both read & write CL=1, which means you are asking Cassandra 
to give up strong consistency in order to gain higher availability and perhaps 
slight faster speed, and that's what you get. If you want to have strong 
consistency, you will need to make sure (read CL + write CL) > RF.
On 10/10/2021 11:55, Isaeed Mohanna wrote:
Hi
We had a cluster with 3 Nodes with Replication Factor 2 and we were using read 
with consistency Level One.
We recently added a 4th node and changed the replication factor to 3, once this 
was done apps reading from DB with CL1 would receive an empty record, Looking 
around I was surprised to learn that upon changing the replication factor if 
the read request is sent to a node the should own the record according to the 
new replication factor while it still doesn’t have it yet then an empty record 
will be returned because of CL1, the record will be written to that node after 
the repair operation is over.
We ran the repair operation which took days in our case (we had to change apps 
to CL2 to avoid serious data inconsistencies).
Now the repair operations are over and if I revert to CL1 we are still getting 
errors that records do not exist in DB while they do, using CL2 again it works 
fine.
Any ideas what I am missing?
Is there a way to validate that the repairs task has actually done what is 
needed and that the data is actually now replicated RF3 ?
Could it it be a Cassandra Driver issue? Since if I issue the request in cqlsh 
I do get the record but I cannot know if I am hitting the replica that doesn’t 
hold the record
Thanks for your help


RE: Trouble After Changing Replication Factor

2021-10-12 Thread Isaeed Mohanna
Hi
Yes I am sacrificing consistency to gain higher availability and faster speed, 
but my problem is not with newly inserted data that is not there for a very 
short period of time, my problem is the data that was there before the RF 
change, still do not exist in all replicas even after repair.
It looks like my cluster configuration is RF3 but the data itself is still 
using RF2 and when the data is requested from the 3rd (new) replica, it is not 
there and an empty record is returned with read CL1.
What can I do to force this data to be synced to all replicas as it should? So 
read CL1 request will actually return a correct result?

Thanks

From: Bowen Song 
Sent: Monday, October 11, 2021 5:13 PM
To: user@cassandra.apache.org
Subject: Re: Trouble After Changing Replication Factor


You have RF=3 and both read & write CL=1, which means you are asking Cassandra 
to give up strong consistency in order to gain higher availability and perhaps 
slight faster speed, and that's what you get. If you want to have strong 
consistency, you will need to make sure (read CL + write CL) > RF.
On 10/10/2021 11:55, Isaeed Mohanna wrote:
Hi
We had a cluster with 3 Nodes with Replication Factor 2 and we were using read 
with consistency Level One.
We recently added a 4th node and changed the replication factor to 3, once this 
was done apps reading from DB with CL1 would receive an empty record, Looking 
around I was surprised to learn that upon changing the replication factor if 
the read request is sent to a node the should own the record according to the 
new replication factor while it still doesn’t have it yet then an empty record 
will be returned because of CL1, the record will be written to that node after 
the repair operation is over.
We ran the repair operation which took days in our case (we had to change apps 
to CL2 to avoid serious data inconsistencies).
Now the repair operations are over and if I revert to CL1 we are still getting 
errors that records do not exist in DB while they do, using CL2 again it works 
fine.
Any ideas what I am missing?
Is there a way to validate that the repairs task has actually done what is 
needed and that the data is actually now replicated RF3 ?
Could it it be a Cassandra Driver issue? Since if I issue the request in cqlsh 
I do get the record but I cannot know if I am hitting the replica that doesn’t 
hold the record
Thanks for your help


Trouble After Changing Replication Factor

2021-10-10 Thread Isaeed Mohanna
Hi
We had a cluster with 3 Nodes with Replication Factor 2 and we were using read 
with consistency Level One.
We recently added a 4th node and changed the replication factor to 3, once this 
was done apps reading from DB with CL1 would receive an empty record, Looking 
around I was surprised to learn that upon changing the replication factor if 
the read request is sent to a node the should own the record according to the 
new replication factor while it still doesn't have it yet then an empty record 
will be returned because of CL1, the record will be written to that node after 
the repair operation is over.
We ran the repair operation which took days in our case (we had to change apps 
to CL2 to avoid serious data inconsistencies).
Now the repair operations are over and if I revert to CL1 we are still getting 
errors that records do not exist in DB while they do, using CL2 again it works 
fine.
Any ideas what I am missing?
Is there a way to validate that the repairs task has actually done what is 
needed and that the data is actually now replicated RF3 ?
Could it it be a Cassandra Driver issue? Since if I issue the request in cqlsh 
I do get the record but I cannot know if I am hitting the replica that doesn't 
hold the record
Thanks for your help


RE: TWCS on Non TTL Data

2021-09-19 Thread Isaeed Mohanna
The point is that I am NOT using TTL and I need to keep the data, so when I do 
the switch to TWCS, will the old files be recompacted or they will remain the 
same and only new data coming in will use TWCS?

From: Bowen Song 
Sent: Friday, September 17, 2021 9:04 PM
To: user@cassandra.apache.org
Subject: Re: TWCS on Non TTL Data


If you use TWCS with TTL, the old SSTables won't be compacted, the entire 
SSTable file will get dropped after it expires. I don't think you will need to 
manage the compaction or cleanup at all, as they are automatic. There's no 
space limit on the table holding the near-term data other than the overall free 
disk space. There's only a time limit on that table.
On 17/09/2021 16:51, Isaeed Mohanna wrote:
Thank for the help,
How does the compaction run? Does it clean old compaction files while running 
or only at the end, I want to manage the free space so not run out while its 
running?


From: Jim Shaw <mailto:jxys...@gmail.com>
Sent: Wednesday, September 15, 2021 3:49 PM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: TWCS on Non TTL Data

You may try roll up the data, i.e.  a table only 1 month data, old data roll up 
to a table keep a year data.

Thanks,
Jim

On Wed, Sep 15, 2021 at 1:26 AM Isaeed Mohanna 
mailto:isa...@xsense.co>> wrote:
My cluster column is the time series timestamp, so basically sourceId, metric 
type for partition key and timestamp for the clustering key the rest of the 
fields are just values outside of the primary key. Our reads request are simply 
give me values for a time range of a specific sourceId,Metric combination. So I 
am guess that during read the sstables that contain the partition key will be 
found and out of those the ones that are out of the range will be excluded, 
correct?
In practice our queries are up to a month by default, only rarely we fetch more 
when someone is exporting the data or so.

In reality also we get old data, that is a source will send its information 
late instead of sending it in realtime it will send all last month\week\day 
data at once, in that case I guess the data will end up in current bucket, will 
that affect performance?

Assuming I start with a  1 week bucket, I could later change the time window 
right?

Thanks


From: Jeff Jirsa mailto:jji...@gmail.com>>
Sent: Tuesday, September 14, 2021 10:35 PM
To: cassandra mailto:user@cassandra.apache.org>>
Subject: Re: TWCS on Non TTL Data

Inline

On Tue, Sep 14, 2021 at 11:47 AM Isaeed Mohanna 
mailto:isa...@xsense.co>> wrote:
Hi Jeff
My data is partitioned by a sourceId and metric, a source is usually active up 
to a year after which there is no additional writes for the partition, and 
reads become scarce, so although this is not an explicit time component, its 
time based, will that suffice?

I guess it means that a single read may touch a year of sstables. Not great, 
but perhaps not fatal. Hopefully your reads avoid that in practice. We'd need 
the full schema to be very sure (does clustering column include month/day? if 
so, there are cases where that can help exclude sstables)


If I use a  week bucket we will be able to serve last few days reads from one 
file and last month from ~5 which is the most common queries, do u think doing 
a months bucket a good idea? That will allow reading from one file most of the 
time but the size of each SSTable will be ~5 times bigger

It'll be 1-4 for most common (up to 4 for same bucket reads because STCS in the 
first bucket is triggered at min_threshold=4), and 5 max, seems reasonable. Way 
better than the 200 or so you're doing now.


When changing the compaction strategy via JMX, do I need to issue the alter 
table command at the end so it will be reflected in the schema or is it taking 
care of automatically? (I am using cassandra 3.11.11)


At the end, yes.

Thanks a lot for your help.








From: Jeff Jirsa mailto:jji...@gmail.com>>
Sent: Tuesday, September 14, 2021 4:51 PM
To: cassandra mailto:user@cassandra.apache.org>>
Subject: Re: TWCS on Non TTL Data



On Tue, Sep 14, 2021 at 5:42 AM Isaeed Mohanna 
mailto:isa...@xsense.co>> wrote:
Hi
I have a table that stores time series data, the data is not TTLed since we 
want to retain the data for the foreseeable future, and there are no updates or 
deletes. (deletes could happens rarely in case some scrambled data reached the 
table, but its extremely rare).
Usually we do constant write of incoming data to the table ~ 5 milion a day, 
mostly newly generated data in the past week, but we also get old data that got 
stuck somewhere but not that often. Usually our reads are for the most recent 
data last month – three. But we do fetch old data as well in a specific time 
period in the past.
Lately we have been facing performance trouble with this table see histogram 
below, When compaction is working on the table the performance even drops to 
10-20 seconds!!
Percentile  SSTables

RE: TWCS on Non TTL Data

2021-09-17 Thread Isaeed Mohanna
Thank for the help,
How does the compaction run? Does it clean old compaction files while running 
or only at the end, I want to manage the free space so not run out while its 
running?


From: Jim Shaw 
Sent: Wednesday, September 15, 2021 3:49 PM
To: user@cassandra.apache.org
Subject: Re: TWCS on Non TTL Data

You may try roll up the data, i.e.  a table only 1 month data, old data roll up 
to a table keep a year data.

Thanks,
Jim

On Wed, Sep 15, 2021 at 1:26 AM Isaeed Mohanna 
mailto:isa...@xsense.co>> wrote:
My cluster column is the time series timestamp, so basically sourceId, metric 
type for partition key and timestamp for the clustering key the rest of the 
fields are just values outside of the primary key. Our reads request are simply 
give me values for a time range of a specific sourceId,Metric combination. So I 
am guess that during read the sstables that contain the partition key will be 
found and out of those the ones that are out of the range will be excluded, 
correct?
In practice our queries are up to a month by default, only rarely we fetch more 
when someone is exporting the data or so.

In reality also we get old data, that is a source will send its information 
late instead of sending it in realtime it will send all last month\week\day 
data at once, in that case I guess the data will end up in current bucket, will 
that affect performance?

Assuming I start with a  1 week bucket, I could later change the time window 
right?

Thanks


From: Jeff Jirsa mailto:jji...@gmail.com>>
Sent: Tuesday, September 14, 2021 10:35 PM
To: cassandra mailto:user@cassandra.apache.org>>
Subject: Re: TWCS on Non TTL Data

Inline

On Tue, Sep 14, 2021 at 11:47 AM Isaeed Mohanna 
mailto:isa...@xsense.co>> wrote:
Hi Jeff
My data is partitioned by a sourceId and metric, a source is usually active up 
to a year after which there is no additional writes for the partition, and 
reads become scarce, so although this is not an explicit time component, its 
time based, will that suffice?

I guess it means that a single read may touch a year of sstables. Not great, 
but perhaps not fatal. Hopefully your reads avoid that in practice. We'd need 
the full schema to be very sure (does clustering column include month/day? if 
so, there are cases where that can help exclude sstables)


If I use a  week bucket we will be able to serve last few days reads from one 
file and last month from ~5 which is the most common queries, do u think doing 
a months bucket a good idea? That will allow reading from one file most of the 
time but the size of each SSTable will be ~5 times bigger

It'll be 1-4 for most common (up to 4 for same bucket reads because STCS in the 
first bucket is triggered at min_threshold=4), and 5 max, seems reasonable. Way 
better than the 200 or so you're doing now.


When changing the compaction strategy via JMX, do I need to issue the alter 
table command at the end so it will be reflected in the schema or is it taking 
care of automatically? (I am using cassandra 3.11.11)


At the end, yes.

Thanks a lot for your help.








From: Jeff Jirsa mailto:jji...@gmail.com>>
Sent: Tuesday, September 14, 2021 4:51 PM
To: cassandra mailto:user@cassandra.apache.org>>
Subject: Re: TWCS on Non TTL Data



On Tue, Sep 14, 2021 at 5:42 AM Isaeed Mohanna 
mailto:isa...@xsense.co>> wrote:
Hi
I have a table that stores time series data, the data is not TTLed since we 
want to retain the data for the foreseeable future, and there are no updates or 
deletes. (deletes could happens rarely in case some scrambled data reached the 
table, but its extremely rare).
Usually we do constant write of incoming data to the table ~ 5 milion a day, 
mostly newly generated data in the past week, but we also get old data that got 
stuck somewhere but not that often. Usually our reads are for the most recent 
data last month – three. But we do fetch old data as well in a specific time 
period in the past.
Lately we have been facing performance trouble with this table see histogram 
below, When compaction is working on the table the performance even drops to 
10-20 seconds!!
Percentile  SSTables Write Latency  Read LatencyPartition Size  
  Cell Count
  (micros)  (micros)   (bytes)
50%   215.00 17.08  89970.66  1916  
 149
75%   446.00 24.60 223875.79  2759  
 215
95%   535.00 35.43 464228.84  8239  
 642
98%   642.00 51.01 668489.53 24601  
1916
99%   642.00 73.46 962624.93 42510  
3311
Min 0.00  2.30  10090.8143  
   0
Max   770.00   1358.102395318.86   5839588  
  454826

As u c

RE: TWCS on Non TTL Data

2021-09-14 Thread Isaeed Mohanna
My cluster column is the time series timestamp, so basically sourceId, metric 
type for partition key and timestamp for the clustering key the rest of the 
fields are just values outside of the primary key. Our reads request are simply 
give me values for a time range of a specific sourceId,Metric combination. So I 
am guess that during read the sstables that contain the partition key will be 
found and out of those the ones that are out of the range will be excluded, 
correct?
In practice our queries are up to a month by default, only rarely we fetch more 
when someone is exporting the data or so.

In reality also we get old data, that is a source will send its information 
late instead of sending it in realtime it will send all last month\week\day 
data at once, in that case I guess the data will end up in current bucket, will 
that affect performance?

Assuming I start with a  1 week bucket, I could later change the time window 
right?

Thanks


From: Jeff Jirsa 
Sent: Tuesday, September 14, 2021 10:35 PM
To: cassandra 
Subject: Re: TWCS on Non TTL Data

Inline

On Tue, Sep 14, 2021 at 11:47 AM Isaeed Mohanna 
mailto:isa...@xsense.co>> wrote:
Hi Jeff
My data is partitioned by a sourceId and metric, a source is usually active up 
to a year after which there is no additional writes for the partition, and 
reads become scarce, so although this is not an explicit time component, its 
time based, will that suffice?

I guess it means that a single read may touch a year of sstables. Not great, 
but perhaps not fatal. Hopefully your reads avoid that in practice. We'd need 
the full schema to be very sure (does clustering column include month/day? if 
so, there are cases where that can help exclude sstables)


If I use a  week bucket we will be able to serve last few days reads from one 
file and last month from ~5 which is the most common queries, do u think doing 
a months bucket a good idea? That will allow reading from one file most of the 
time but the size of each SSTable will be ~5 times bigger

It'll be 1-4 for most common (up to 4 for same bucket reads because STCS in the 
first bucket is triggered at min_threshold=4), and 5 max, seems reasonable. Way 
better than the 200 or so you're doing now.


When changing the compaction strategy via JMX, do I need to issue the alter 
table command at the end so it will be reflected in the schema or is it taking 
care of automatically? (I am using cassandra 3.11.11)


At the end, yes.

Thanks a lot for your help.








From: Jeff Jirsa mailto:jji...@gmail.com>>
Sent: Tuesday, September 14, 2021 4:51 PM
To: cassandra mailto:user@cassandra.apache.org>>
Subject: Re: TWCS on Non TTL Data



On Tue, Sep 14, 2021 at 5:42 AM Isaeed Mohanna 
mailto:isa...@xsense.co>> wrote:
Hi
I have a table that stores time series data, the data is not TTLed since we 
want to retain the data for the foreseeable future, and there are no updates or 
deletes. (deletes could happens rarely in case some scrambled data reached the 
table, but its extremely rare).
Usually we do constant write of incoming data to the table ~ 5 milion a day, 
mostly newly generated data in the past week, but we also get old data that got 
stuck somewhere but not that often. Usually our reads are for the most recent 
data last month – three. But we do fetch old data as well in a specific time 
period in the past.
Lately we have been facing performance trouble with this table see histogram 
below, When compaction is working on the table the performance even drops to 
10-20 seconds!!
Percentile  SSTables Write Latency  Read LatencyPartition Size  
  Cell Count
  (micros)  (micros)   (bytes)
50%   215.00 17.08  89970.66  1916  
 149
75%   446.00 24.60 223875.79  2759  
 215
95%   535.00 35.43 464228.84  8239  
 642
98%   642.00 51.01 668489.53 24601  
1916
99%   642.00 73.46 962624.93 42510  
3311
Min 0.00  2.30  10090.8143  
   0
Max   770.00   1358.102395318.86   5839588  
  454826

As u can see we are scaning hundreds of sstables, turns out we are using DTCS  
(min:4,max32) , the table folder contains ~33K files  of ~130GB per node 
(cleanup pending after increasing the cluster), And compaction takes a very 
long time to complete.
As I understood DTCS is deprecated so my questions

  1.  should we switch to TWCS even though our data is not TTLed since we do 
not do delete at all can we still use it? Will it improve performance?
It will probably be better than DTCS here, but you'll still have potentially 
lots of sstables over time.

Lots of sstables in itself isn't a big deal

RE: TWCS on Non TTL Data

2021-09-14 Thread Isaeed Mohanna
Hi Jeff
My data is partitioned by a sourceId and metric, a source is usually active up 
to a year after which there is no additional writes for the partition, and 
reads become scarce, so although this is not an explicit time component, its 
time based, will that suffice?

If I use a  week bucket we will be able to serve last few days reads from one 
file and last month from ~5 which is the most common queries, do u think doing 
a months bucket a good idea? That will allow reading from one file most of the 
time but the size of each SSTable will be ~5 times bigger

When changing the compaction strategy via JMX, do I need to issue the alter 
table command at the end so it will be reflected in the schema or is it taking 
care of automatically? (I am using cassandra 3.11.11)

Thanks a lot for your help.








From: Jeff Jirsa 
Sent: Tuesday, September 14, 2021 4:51 PM
To: cassandra 
Subject: Re: TWCS on Non TTL Data



On Tue, Sep 14, 2021 at 5:42 AM Isaeed Mohanna 
mailto:isa...@xsense.co>> wrote:
Hi
I have a table that stores time series data, the data is not TTLed since we 
want to retain the data for the foreseeable future, and there are no updates or 
deletes. (deletes could happens rarely in case some scrambled data reached the 
table, but its extremely rare).
Usually we do constant write of incoming data to the table ~ 5 milion a day, 
mostly newly generated data in the past week, but we also get old data that got 
stuck somewhere but not that often. Usually our reads are for the most recent 
data last month – three. But we do fetch old data as well in a specific time 
period in the past.
Lately we have been facing performance trouble with this table see histogram 
below, When compaction is working on the table the performance even drops to 
10-20 seconds!!
Percentile  SSTables Write Latency  Read LatencyPartition Size  
  Cell Count
  (micros)  (micros)   (bytes)
50%   215.00 17.08  89970.66  1916  
 149
75%   446.00 24.60 223875.79  2759  
 215
95%   535.00 35.43 464228.84  8239  
 642
98%   642.00 51.01 668489.53 24601  
1916
99%   642.00 73.46 962624.93 42510  
3311
Min 0.00  2.30  10090.8143  
   0
Max   770.00   1358.102395318.86   5839588  
  454826

As u can see we are scaning hundreds of sstables, turns out we are using DTCS  
(min:4,max32) , the table folder contains ~33K files  of ~130GB per node 
(cleanup pending after increasing the cluster), And compaction takes a very 
long time to complete.
As I understood DTCS is deprecated so my questions

  1.  should we switch to TWCS even though our data is not TTLed since we do 
not do delete at all can we still use it? Will it improve performance?
It will probably be better than DTCS here, but you'll still have potentially 
lots of sstables over time.

Lots of sstables in itself isn't a big deal, the problem comes from scanning 
more than a handful on each read. Does your table have some form of date 
bucketing to avoid touching old data files?



  1.  If we should switch I am thinking of using a time window of a week, this 
way the read will scan 10s of sstables instead of hundreds today. Does it sound 
reasonable?
10s is better than hundreds, but it's still a lot.


  1.  Is there a recommended size of a window bucket in terms of disk space?
When I wrote it, I wrote it for a use case that had 30 windows over the whole 
set of data. Since then, I've seen it used with anywhere from 5 to 60 buckets.
With no TTL, you're effectively doing infinite buckets. So the only way to 
ensure you're not touching too many sstables is to put the date (in some form) 
into the partition key and let the database use that (+bloom filters) to avoid 
reading too many sstables.

  1.  If TWCS is not a good idea should I switch to STCS instead could that 
yield in better performance than current situation?
LCS will give you better read performance. STCS will probably be better than 
DTCS given the 215 sstable p50 you're seeing (which is crazy btw, I'm surprised 
you're not just OOMing)


  1.  What are the risk of changing compaction strategy on a production system, 
can it be done on the fly? Or its better to go through a full test, backup 
cycle?

The risk is you trigger a ton of compactions which drops the performance of the 
whole system all at once and your front door queries all time out.
You can approach this a few ways:
- Use the JMX endpoint to change compaction on one instance at a time (rather 
than doing it in the schema), which lets you control how many nodes are 
re-writing all their data at any given point in time
- You ca

TWCS on Non TTL Data

2021-09-14 Thread Isaeed Mohanna
Hi
I have a table that stores time series data, the data is not TTLed since we 
want to retain the data for the foreseeable future, and there are no updates or 
deletes. (deletes could happens rarely in case some scrambled data reached the 
table, but its extremely rare).
Usually we do constant write of incoming data to the table ~ 5 milion a day, 
mostly newly generated data in the past week, but we also get old data that got 
stuck somewhere but not that often. Usually our reads are for the most recent 
data last month - three. But we do fetch old data as well in a specific time 
period in the past.
Lately we have been facing performance trouble with this table see histogram 
below, When compaction is working on the table the performance even drops to 
10-20 seconds!!
Percentile  SSTables Write Latency  Read LatencyPartition Size  
  Cell Count
  (micros)  (micros)   (bytes)
50%   215.00 17.08  89970.66  1916  
 149
75%   446.00 24.60 223875.79  2759  
 215
95%   535.00 35.43 464228.84  8239  
 642
98%   642.00 51.01 668489.53 24601  
1916
99%   642.00 73.46 962624.93 42510  
3311
Min 0.00  2.30  10090.8143  
   0
Max   770.00   1358.102395318.86   5839588  
  454826

As u can see we are scaning hundreds of sstables, turns out we are using DTCS  
(min:4,max32) , the table folder contains ~33K files  of ~130GB per node 
(cleanup pending after increasing the cluster), And compaction takes a very 
long time to complete.
As I understood DTCS is deprecated so my questions

  1.  should we switch to TWCS even though our data is not TTLed since we do 
not do delete at all can we still use it? Will it improve performance?
  2.  If we should switch I am thinking of using a time window of a week, this 
way the read will scan 10s of sstables instead of hundreds today. Does it sound 
reasonable?
  3.  Is there a recommended size of a window bucket in terms of disk space?
  4.  If TWCS is not a good idea should I switch to STCS instead could that 
yield in better performance than current situation?
  5.  What are the risk of changing compaction strategy on a production system, 
can it be done on the fly? Or its better to go through a full test, backup 
cycle?

All input will be appreciated,
Thank you