Re: Query around Data Modelling -2

2022-07-01 Thread Bowen Song via user
I don't recall myself ever seen any recommendation on periodically 
running major compactions. Can you share the source of your information?


During the major compaction, the server will be under heavy load, and it 
will need to rewrite ALL sstables. This actually hurts the read 
performance while the compaction is running.


The most important factor of read performance is the amount of data each 
node has to scan in order to complete the read query. Large partitions, 
too many tombstones, partition spread in too many sstables, etc. all 
hurts the performance. You will need to find the bottleneck and act on 
it in order to improve read performance.


Artificially spreading the data from one LCS table into many tables with 
identical schema is not likely to improve the read performance. The only 
benefit you get is more compaction parallelisation, and that may further 
hurt the read performance if the bottleneck is CPU usage, disk IO, or GC.


If you know the table is heavily read, and you have a performance issue 
with that, maybe it's time to redesign the table schema and optimise for 
the most frequently used read queries.


On 01/07/2022 11:29, MyWorld wrote:

 Michiel, This is not in our use case. Since our data is not time 
series, there is no TTL in our case.


Bowen, I think this is what is generally recommend to run a major 
compaction once in a week for better read performance.


On Fri, Jul 1, 2022, 6:52 AM Michiel Saelen 
 wrote:


Hi,

We did do compaction job every week in the past to keep the disk
space used under control as we had mainly data in the table that
needs to expire with TTL and were also using levelled compaction.

In our case we had different TTL’s in the same table and the
partitions were spread over multiple ssTables, as the partitions
were never closing and therefor kept on pushing changes we ended
up with repair actions that had to cover a lot of ssTables which
is heavy on memory and CPU.
By changing the compaction strategy to TWCS

<https://cassandra.apache.org/doc/latest/cassandra/operating/compaction/twcs.html>,
splitting the table into different tables with their own TTL and
adding a part to the partition key (e.g. the day of the year) to
close the partitions, so they can be “marked” as repaired, we were
able to get rid of these heavy compaction actions.

Not sure if you have the same use case, just wanted to share this
info.

Kind regards,

Michiel

<https://skyline.be/jobs/en>








*Michiel Saelen *|Principal Solution Architect

Email michiel.sae...@skyline.be <mailto:michiel.sae...@skyline.be>



Skyline Communications

39 Hong Kong Street #02-01 |Singapore 059678
www.skyline.be <https://www.skyline.be>|+65 6920 1145


<https://skyline.be/>








<https://teams.microsoft.com/l/chat/0/0?users=michiel.sae...@skyline.be>





<https://community.dataminer.services/?utm_source=signature_medium=email_campaign=icon>




<https://www.linkedin.com/company/skyline-communications>




<https://www.youtube.com/user/SkylineCommu>




<https://www.facebook.com/SkylineCommunications/>




<https://www.instagram.com/skyline.dataminer/>





<https://skyline.be/skyline/awards?utm_source=signature_medium=email_campaign=icon>



*From:* Bowen Song 
*Sent:* Friday, July 1, 2022 08:48
    *To:* user@cassandra.apache.org
*Subject:* Re: Query around Data Modelling -2




This message was sent from outside the company. Please do not
click links or open attachments unless you recognise the source of
this email and know the content is safe.

And why do you do that?

On 30/06/2022 16:35, MyWorld wrote:

We run major compaction once in a week

On Thu, Jun 30, 2022, 8:14 PM Bowen Song  wrote:

I have noticed this "running a weekly repair and
compaction job".

What do you mean weekly compaction job? Have you disabled
the auto-compaction on the table and is relying on weekly
scheduled compactions? Or running weekly major
compactions? Neither of these sounds right.

On 30/06/2022 15:03, MyWorld wrote:

Hi all,

Another query around data Modelling.

We have a existing table with below structure:

Table(PK,CK, col1,col2, col3, col4,col5)

Now each Pk here have 1k - 10k Clustering keys. Each
PK has size from 10MB to 80MB. We have overall 100+
millions partitions. Also we have set levelled
compactions in place so as to get better read response

Re: Query around Data Modelling -2

2022-07-01 Thread MyWorld
 Michiel, This is not in our use case. Since our data is not time series,
there is no TTL in our case.

Bowen, I think this is what is generally recommend to run a major
compaction once in a week for better read performance.

On Fri, Jul 1, 2022, 6:52 AM Michiel Saelen 
wrote:

> Hi,
>
> We did do compaction job every week in the past to keep the disk space
> used under control as we had mainly data in the table that needs to expire
> with TTL and were also using levelled compaction.
>
> In our case we had different TTL’s in the same table and the partitions
> were spread over multiple ssTables, as the partitions were never closing
> and therefor kept on pushing changes we ended up with repair actions that
> had to cover a lot of ssTables which is heavy on memory and CPU.
> By changing the compaction strategy to TWCS
> <https://cassandra.apache.org/doc/latest/cassandra/operating/compaction/twcs.html>,
> splitting the table into different tables with their own TTL and adding a
> part to the partition key (e.g. the day of the year) to close the
> partitions, so they can be “marked” as repaired, we were able to get rid of
> these heavy compaction actions.
>
>
>
> Not sure if you have the same use case, just wanted to share this info.
>
>
>
> Kind regards,
>
> Michiel
>
>
>
> <https://skyline.be/jobs/en>
>
>
>
>
>
> *Michiel Saelen *| Principal Solution Architect
>
> Email michiel.sae...@skyline.be
>
>
>
> Skyline Communications
>
> 39 Hong Kong Street #02-01 | Singapore 059678
> www.skyline.be | +65 6920 1145 <+6569201145>
>
>
>
> <https://skyline.be/>
>
>
>
>
>
> <https://teams.microsoft.com/l/chat/0/0?users=michiel.sae...@skyline.be>
>
>
> <https://community.dataminer.services/?utm_source=signature_medium=email_campaign=icon>
>
> <https://www.linkedin.com/company/skyline-communications>
>
> <https://www.youtube.com/user/SkylineCommu>
>
> <https://www.facebook.com/SkylineCommunications/>
>
> <https://www.instagram.com/skyline.dataminer/>
>
>
> <https://skyline.be/skyline/awards?utm_source=signature_medium=email_campaign=icon>
>
>
>
>
>
>
>
> *From:* Bowen Song 
> *Sent:* Friday, July 1, 2022 08:48
> *To:* user@cassandra.apache.org
> *Subject:* Re: Query around Data Modelling -2
>
>
>
> This message was sent from outside the company. Please do not click links
> or open attachments unless you recognise the source of this email and know
> the content is safe.
>
>
>
> And why do you do that?
>
> On 30/06/2022 16:35, MyWorld wrote:
>
> We run major compaction once in a week
>
>
>
> On Thu, Jun 30, 2022, 8:14 PM Bowen Song  wrote:
>
> I have noticed this "running a weekly repair and compaction job".
>
> What do you mean weekly compaction job? Have you disabled the
> auto-compaction on the table and is relying on weekly scheduled
> compactions? Or running weekly major compactions? Neither of these sounds
> right.
>
> On 30/06/2022 15:03, MyWorld wrote:
>
> Hi all,
>
>
>
> Another query around data Modelling.
>
>
>
> We have a existing table with below structure:
>
> Table(PK,CK, col1,col2, col3, col4,col5)
>
>
>
> Now each Pk here have 1k - 10k Clustering keys. Each PK has size from 10MB
> to 80MB. We have overall 100+ millions partitions. Also we have set
> levelled compactions in place so as to get better read response time.
>
>
>
> We are currently on 3.11.x version of Cassandra. On running a weekly
> repair and compaction job, this model because of levelled compaction
> (occupied till Level 3) consume heavy cpu resource and impact db
> performance.
>
>
>
> Now what if we divide this table in 10 with each table containing 1/10
> partitions. So now each table will be limited to levelled compaction upto
> level-2. I think this would ease down read as well as compaction task.
>
>
>
> What is your opinion on this?
>
> Even if we upgrade to ver 4.0, is the second model ok?
>
>
>
>


RE: Query around Data Modelling -2

2022-06-30 Thread Michiel Saelen
Hi,

We did do compaction job every week in the past to keep the disk space used 
under control as we had mainly data in the table that needs to expire with TTL 
and were also using levelled compaction.
In our case we had different TTL’s in the same table and the partitions were 
spread over multiple ssTables, as the partitions were never closing and 
therefor kept on pushing changes we ended up with repair actions that had to 
cover a lot of ssTables which is heavy on memory and CPU.
By changing the compaction strategy to 
TWCS<https://cassandra.apache.org/doc/latest/cassandra/operating/compaction/twcs.html>,
 splitting the table into different tables with their own TTL and adding a part 
to the partition key (e.g. the day of the year) to close the partitions, so 
they can be “marked” as repaired, we were able to get rid of these heavy 
compaction actions.

Not sure if you have the same use case, just wanted to share this info.

Kind regards,
Michiel

[cid:image001.png@01D88D2B.263669C0]<https://skyline.be/jobs/en>


Michiel Saelen | Principal Solution Architect
Email michiel.sae...@skyline.be<mailto:michiel.sae...@skyline.be>

Skyline Communications
39 Hong Kong Street #02-01 | Singapore 059678
www.skyline.be<https://www.skyline.be> | +65 6920 1145

[cid:image002.png@01D88D2B.263669C0]<https://skyline.be/>


[cid:image003.png@01D88D2B.263669C0]<https://teams.microsoft.com/l/chat/0/0?users=michiel.sae...@skyline.be>
[cid:image004.png@01D88D2B.263669C0]<https://community.dataminer.services/?utm_source=signature_medium=email_campaign=icon>
[cid:image005.png@01D88D2B.263669C0]<https://www.linkedin.com/company/skyline-communications>
[cid:image006.png@01D88D2B.263669C0]<https://www.youtube.com/user/SkylineCommu>
[cid:image007.png@01D88D2B.263669C0]<https://www.facebook.com/SkylineCommunications/>
[cid:image008.png@01D88D2B.263669C0]<https://www.instagram.com/skyline.dataminer/>
[cid:image009.png@01D88D2B.263669C0]<https://skyline.be/skyline/awards?utm_source=signature_medium=email_campaign=icon>


[cid:image010.png@01D88D2B.263669C0]

From: Bowen Song 
Sent: Friday, July 1, 2022 08:48
To: user@cassandra.apache.org
Subject: Re: Query around Data Modelling -2

This message was sent from outside the company. Please do not click links or 
open attachments unless you recognise the source of this email and know the 
content is safe.


And why do you do that?
On 30/06/2022 16:35, MyWorld wrote:
We run major compaction once in a week

On Thu, Jun 30, 2022, 8:14 PM Bowen Song mailto:bo...@bso.ng>> 
wrote:

I have noticed this "running a weekly repair and compaction job".

What do you mean weekly compaction job? Have you disabled the auto-compaction 
on the table and is relying on weekly scheduled compactions? Or running weekly 
major compactions? Neither of these sounds right.
On 30/06/2022 15:03, MyWorld wrote:
Hi all,

Another query around data Modelling.

We have a existing table with below structure:
Table(PK,CK, col1,col2, col3, col4,col5)

Now each Pk here have 1k - 10k Clustering keys. Each PK has size from 10MB to 
80MB. We have overall 100+ millions partitions. Also we have set levelled 
compactions in place so as to get better read response time.

We are currently on 3.11.x version of Cassandra. On running a weekly repair and 
compaction job, this model because of levelled compaction (occupied till Level 
3) consume heavy cpu resource and impact db performance.

Now what if we divide this table in 10 with each table containing 1/10 
partitions. So now each table will be limited to levelled compaction upto 
level-2. I think this would ease down read as well as compaction task.

What is your opinion on this?
Even if we upgrade to ver 4.0, is the second model ok?



Re: Query around Data Modelling -2

2022-06-30 Thread Bowen Song

And why do you do that?

On 30/06/2022 16:35, MyWorld wrote:

We run major compaction once in a week

On Thu, Jun 30, 2022, 8:14 PM Bowen Song  wrote:

I have noticed this "running a weekly repair and compaction job".

What do you mean weekly compaction job? Have you disabled the
auto-compaction on the table and is relying on weekly scheduled
compactions? Or running weekly major compactions? Neither of these
sounds right.

On 30/06/2022 15:03, MyWorld wrote:

Hi all,

Another query around data Modelling.

We have a existing table with below structure:
Table(PK,CK, col1,col2, col3, col4,col5)

Now each Pk here have 1k - 10k Clustering keys. Each PK has size
from 10MB to 80MB. We have overall 100+ millions partitions. Also
we have set levelled compactions in place so as to get better
read response time.

We are currently on 3.11.x version of Cassandra. On running a
weekly repair and compaction job, this model because of levelled
compaction (occupied till Level 3) consume heavy cpu resource and
impact db performance.

Now what if we divide this table in 10 with each table containing
1/10 partitions. So now each table will be limited to levelled
compaction upto level-2. I think this would ease down read as
well as compaction task.

What is your opinion on this?
Even if we upgrade to ver 4.0, is the second model ok?


Re: Query around Data Modelling -2

2022-06-30 Thread MyWorld
We run major compaction once in a week

On Thu, Jun 30, 2022, 8:14 PM Bowen Song  wrote:

> I have noticed this "running a weekly repair and compaction job".
>
> What do you mean weekly compaction job? Have you disabled the
> auto-compaction on the table and is relying on weekly scheduled
> compactions? Or running weekly major compactions? Neither of these sounds
> right.
> On 30/06/2022 15:03, MyWorld wrote:
>
> Hi all,
>
> Another query around data Modelling.
>
> We have a existing table with below structure:
> Table(PK,CK, col1,col2, col3, col4,col5)
>
> Now each Pk here have 1k - 10k Clustering keys. Each PK has size from 10MB
> to 80MB. We have overall 100+ millions partitions. Also we have set
> levelled compactions in place so as to get better read response time.
>
> We are currently on 3.11.x version of Cassandra. On running a weekly
> repair and compaction job, this model because of levelled compaction
> (occupied till Level 3) consume heavy cpu resource and impact db
> performance.
>
> Now what if we divide this table in 10 with each table containing 1/10
> partitions. So now each table will be limited to levelled compaction upto
> level-2. I think this would ease down read as well as compaction task.
>
> What is your opinion on this?
> Even if we upgrade to ver 4.0, is the second model ok?
>
>


Re: Query around Data Modelling -2

2022-06-30 Thread Bowen Song

I have noticed this "running a weekly repair and compaction job".

What do you mean weekly compaction job? Have you disabled the 
auto-compaction on the table and is relying on weekly scheduled 
compactions? Or running weekly major compactions? Neither of these 
sounds right.


On 30/06/2022 15:03, MyWorld wrote:

Hi all,

Another query around data Modelling.

We have a existing table with below structure:
Table(PK,CK, col1,col2, col3, col4,col5)

Now each Pk here have 1k - 10k Clustering keys. Each PK has size from 
10MB to 80MB. We have overall 100+ millions partitions. Also we have 
set levelled compactions in place so as to get better read response time.


We are currently on 3.11.x version of Cassandra. On running a weekly 
repair and compaction job, this model because of levelled compaction 
(occupied till Level 3) consume heavy cpu resource and impact db 
performance.


Now what if we divide this table in 10 with each table containing 1/10 
partitions. So now each table will be limited to levelled compaction 
upto level-2. I think this would ease down read as well as compaction 
task.


What is your opinion on this?
Even if we upgrade to ver 4.0, is the second model ok?


Re: Query around Data Modelling -2

2022-06-30 Thread MyWorld
Hi Jeff,
We are running repair with -pr option.

You are right it would have no or very minimal impact on read (considering
the fact now data has to be read from 2 levels instead of 3). But my guess
there is no negative impact of this model2.


On Thu, Jun 30, 2022, 7:41 PM Jeff Jirsa  wrote:

> How are you running repair? -pr? Or -st/-et?
>
> 4.0 gives you real incremental repair which helps. Splitting the table
> won’t make reads faster. It will increase the potential parallelization of
> compaction.
>
> On Jun 30, 2022, at 7:04 AM, MyWorld  wrote:
>
> 
> Hi all,
>
> Another query around data Modelling.
>
> We have a existing table with below structure:
> Table(PK,CK, col1,col2, col3, col4,col5)
>
> Now each Pk here have 1k - 10k Clustering keys. Each PK has size from 10MB
> to 80MB. We have overall 100+ millions partitions. Also we have set
> levelled compactions in place so as to get better read response time.
>
> We are currently on 3.11.x version of Cassandra. On running a weekly
> repair and compaction job, this model because of levelled compaction
> (occupied till Level 3) consume heavy cpu resource and impact db
> performance.
>
> Now what if we divide this table in 10 with each table containing 1/10
> partitions. So now each table will be limited to levelled compaction upto
> level-2. I think this would ease down read as well as compaction task.
>
> What is your opinion on this?
> Even if we upgrade to ver 4.0, is the second model ok?
>
>


Re: Query around Data Modelling -2

2022-06-30 Thread Jeff Jirsa
How are you running repair? -pr? Or -st/-et?

4.0 gives you real incremental repair which helps. Splitting the table won’t 
make reads faster. It will increase the potential parallelization of 
compaction. 

> On Jun 30, 2022, at 7:04 AM, MyWorld  wrote:
> 
> 
> Hi all,
> 
> Another query around data Modelling.
> 
> We have a existing table with below structure:
> Table(PK,CK, col1,col2, col3, col4,col5)
> 
> Now each Pk here have 1k - 10k Clustering keys. Each PK has size from 10MB to 
> 80MB. We have overall 100+ millions partitions. Also we have set levelled 
> compactions in place so as to get better read response time.
> 
> We are currently on 3.11.x version of Cassandra. On running a weekly repair 
> and compaction job, this model because of levelled compaction (occupied till 
> Level 3) consume heavy cpu resource and impact db performance.
> 
> Now what if we divide this table in 10 with each table containing 1/10 
> partitions. So now each table will be limited to levelled compaction upto 
> level-2. I think this would ease down read as well as compaction task.
> 
> What is your opinion on this?
> Even if we upgrade to ver 4.0, is the second model ok?
>