RE: Best compaction strategy for rarely used data

2022-12-29 Thread Durity, Sean R via user
If there isn’t a TTL and timestamp on the data, I’m not sure the benefits of 
TWCS for this use case. I would stick with size-tiered. At some point you will 
end up with large sstables (like 1 TB) that won’t compact because there are not 
4 similar-sized ones able to be compacted (assuming default parameters for 
STCS). And if your data is ever-growing and never deleted, you will be adding 
nodes to handle the extra data as time goes by (and running clean-up on the 
existing nodes). For me, the backup strategy shouldn’t drive the rest.


Sean R. Durity

From: Paul Chandler 
Sent: Thursday, December 29, 2022 4:51 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Best compaction strategy for rarely used data

Hi Lapo Take a look at TWCS, I think that could help your use case: https: 
//thelastpickle. com/blog/2016/12/08/TWCS-part1. html [thelastpickle. com] 
Regards Paul Chandler Sent from my iPhone On 29 Dec 2022, at 08: 55, Lapo 
Luchini 

Hi Lapo

Take a look at TWCS, I think that could help your use case: 
https://thelastpickle.com/blog/2016/12/08/TWCS-part1.html 
[thelastpickle.com]

Regards

Paul Chandler
Sent from my iPhone

On 29 Dec 2022, at 08:55, Lapo Luchini mailto:l...@lapo.it>> 
wrote:
Hi, I have a table which gets (a lot of) data that is written once and very 
rarely read (it is used for data that is mandatory for regulatory reasons), and 
almost never deleted.

I'm using the default SCTS as at the time I didn't know any better, but 
SSTables size are getting huge, which is a problem because they both are 
getting to the size of the available disk and both because I'm using a 
snapshot-based system to backup the node (and thus compacting a huge SSTable 
into an even bigger one generates a lot of traffic for mostly-old data).

I'm thinking about switching to LCS (mainly to solve the size issue), but I 
read that it is "optimized for read heavy workloads […] not a good choice for 
immutable time series data". Given that I don't really care about write nor 
read speed, but would like SSTables size to have a upper limit, would this 
strategy still be the best?

PS: Googling around a strategy called "incremental compaction" (ICS) keeps 
getting in results, but that's only available in ScyllaDB, right?

--
Lapo Luchini
l...@lapo.it


INTERNAL USE


Re: Best compaction strategy for rarely used data

2022-12-29 Thread Paul Chandler
Hi Lapo

Take a look at TWCS, I think that could help your use case: 
https://thelastpickle.com/blog/2016/12/08/TWCS-part1.html

Regards 

Paul Chandler

Sent from my iPhone

> On 29 Dec 2022, at 08:55, Lapo Luchini  wrote:
> 
> Hi, I have a table which gets (a lot of) data that is written once and very 
> rarely read (it is used for data that is mandatory for regulatory reasons), 
> and almost never deleted.
> 
> I'm using the default SCTS as at the time I didn't know any better, but 
> SSTables size are getting huge, which is a problem because they both are 
> getting to the size of the available disk and both because I'm using a 
> snapshot-based system to backup the node (and thus compacting a huge SSTable 
> into an even bigger one generates a lot of traffic for mostly-old data).
> 
> I'm thinking about switching to LCS (mainly to solve the size issue), but I 
> read that it is "optimized for read heavy workloads […] not a good choice for 
> immutable time series data". Given that I don't really care about write nor 
> read speed, but would like SSTables size to have a upper limit, would this 
> strategy still be the best?
> 
> PS: Googling around a strategy called "incremental compaction" (ICS) keeps 
> getting in results, but that's only available in ScyllaDB, right?
> 
> -- 
> Lapo Luchini
> l...@lapo.it
> 


Best compaction strategy for rarely used data

2022-12-29 Thread Lapo Luchini
Hi, I have a table which gets (a lot of) data that is written once and 
very rarely read (it is used for data that is mandatory for regulatory 
reasons), and almost never deleted.


I'm using the default SCTS as at the time I didn't know any better, but 
SSTables size are getting huge, which is a problem because they both are 
getting to the size of the available disk and both because I'm using a 
snapshot-based system to backup the node (and thus compacting a huge 
SSTable into an even bigger one generates a lot of traffic for 
mostly-old data).


I'm thinking about switching to LCS (mainly to solve the size issue), 
but I read that it is "optimized for read heavy workloads […] not a good 
choice for immutable time series data". Given that I don't really care 
about write nor read speed, but would like SSTables size to have a upper 
limit, would this strategy still be the best?


PS: Googling around a strategy called "incremental compaction" (ICS) 
keeps getting in results, but that's only available in ScyllaDB, right?


--
Lapo Luchini
l...@lapo.it