Re: Cassandra 4.0 upgrade - Upgradesstables

2022-08-16 Thread Jai Bheemsen Rao Dhanwada
Thank you

On Tue, Aug 16, 2022 at 11:48 AM C. Scott Andreas 
wrote:

> No downside at all for 3.x -> 4.x (however, Cassandra 3.x reading 2.1
> SSTables incurred a performance hit).
>
> Many users of Cassandra don't run upgradesstables after 3.x -> 4.x
> upgrades at all. It's not necessary to run until a hypothetical future time
> if/when support for reading Cassandra 3.x SSTables is removed from
> Cassandra. One of the most common reasons to avoid running upgradesstables
> is because doing so causes 100% churn of the data files, meaning your
> backup processes will need to upload a full copy of the data. Allowing
> SSTables to organically churn into the new version via compaction avoids
> this.
>
> If you're upgrading from 3.x to 4.x, don't feel like you have to - but it
> does avoid the need to run upgradesstables in a hypothetical distant future.
>
> – Scott
>
> On Aug 16, 2022, at 6:32 AM, Jai Bheemsen Rao Dhanwada <
> jaibheem...@gmail.com> wrote:
>
>
> Thank you Erick,
>
> > it is going to be single-threaded by default so it will take a while to
> get through all the sstables on dense nodes
> Is there any downside if the upgradesstables take longer (example 1-2
> days), other than I/O?
>
> Also when is the upgradesstable get triggered? after every node is
> upgraded or it will kick in only when all the nodes in the cluster upgraded
> to 4.0.x?
>
> On Tue, Aug 16, 2022 at 2:12 AM Erick Ramirez 
> wrote:
>
>> As convenient as it is, there are a few caveats and it isn't a silver
>> bullet. The automatic feature will only kick in if there are no other
>> compactions scheduled. Also, it is going to be single-threaded by default
>> so it will take a while to get through all the sstables on dense nodes.
>>
>> In contrast, you'll have a bit more control if you manually upgrade the
>> sstables. For example, you can schedule the upgrade during low traffic
>> periods so reads are not competing with compactions for IO. Cheers!
>>
>>>
>>>
>


Re: Cassandra 4.0 upgrade - Upgradesstables

2022-08-16 Thread C. Scott Andreas

No downside at all for 3.x -> 4.x (however, Cassandra 3.x reading 2.1 SSTables incurred a 
performance hit).Many users of Cassandra don't run upgradesstables after 3.x -> 4.x upgrades at 
all. It's not necessary to run until a hypothetical future time if/when support for reading 
Cassandra 3.x SSTables is removed from Cassandra. One of the most common reasons to avoid running 
upgradesstables is because doing so causes 100% churn of the data files, meaning your backup 
processes will need to upload a full copy of the data. Allowing SSTables to organically churn into 
the new version via compaction avoids this.If you're upgrading from 3.x to 4.x, don't feel like you 
have to - but it does avoid the need to run upgradesstables in a hypothetical distant future.– 
ScottOn Aug 16, 2022, at 6:32 AM, Jai Bheemsen Rao Dhanwada  
wrote:Thank you Erick,> it is going to be single-threaded by default so it will take a while to 
get through all the sstables on dense nodesIs there any downside if the upgradesstables take longer 
(example 1-2 days), other than I/O?Also when is the upgradesstable get triggered? after every node 
is upgraded or it will kick in only when all the nodes in the cluster upgraded to 4.0.x?On Tue, Aug 
16, 2022 at 2:12 AM Erick Ramirez  wrote:As convenient as it is, 
there are a few caveats and it isn't a silver bullet. The automatic feature will only kick in if 
there are no other compactions scheduled. Also, it is going to be single-threaded by default so it 
will take a while to get through all the sstables on dense nodes.In contrast, you'll have a bit more 
control if you manually upgrade the sstables. For example, you can schedule the upgrade during low 
traffic periods so reads are not competing with compactions for IO. Cheers!

Re: Question about num_tokens

2022-08-16 Thread Jai Bheemsen Rao Dhanwada
Thanks for the response and details. I am just curious about the below
statement mentioned in the doc. I am pretty confident that my clusters are
going to grow to 100+ nodes (same DC or combining all DCs). I am just
concerned that the doc says it is *not recommended for clusters over 50
nodes*.

16

Best for heavily elastic clusters which expand and shrink regularly, but
may have issues availability with larger clusters. Not recommended for
clusters over 50 nodes.

On Sun, Mar 13, 2022 at 11:34 PM Elliott Sims  wrote:

> More tokens:  better data distribution, more expensive repairs, higher
> probability of a multi-host outage taking some data offline and affecting
> availability.
>
> I think with >100 nodes the repair times and availability improvements
> make a strong case for 16 tokens even though it means you'll need more
> total raw space.
>
> Switching from 256 to 16 vnodes definitely will make data distribution
> worse.  I'm not sure "hot spot" is the right description so much as a wider
> curve.  I've got one cluster that hasn't been migrated from 256 to 16, and
> it has about a 6% delta between the smallest and largest nodes instead of
> more like 20% on the 16-vnode clusters.  The newer
> allocate_tokens_for_keyspace and (better)
> allocate_tokens_for_replication_factor options help limit the data
> distribution issues, but don't totally eliminate them.
>
> On the other hand, the 16-vnode cluster takes less than half as long to
> complete repairs via Reaper.  It also spends more time on GC, though I
> can't tell whether that's due to vnodes or other differences.
>
> On Sun, Mar 13, 2022 at 5:59 PM Jai Bheemsen Rao Dhanwada <
> jaibheem...@gmail.com> wrote:
>
>> Hello Team,
>>
>> I am currently using num_tokens: 256 (default in 3.11.X version) for my
>> clusters and trying to understand the advantages vs disadvantages of
>> changing it to 16 (I believe 16 is the new recommended value).  As per the 
>> cassandra
>> documentation
>> 
>>  16
>> is not recommended for the cluster over 50 nodes.
>>
>> Best for heavily elastic clusters which expand and shrink regularly, but
>>> may have issues availability with larger clusters. Not recommended for
>>> clusters over 50 nodes.
>>
>>
>> I have a few questions.
>>
>>
>>1. What are the general recommendations for a production cluster
>>which is > 100 nodes and are heavily elastic in terms of adding and
>>removing nodes.
>>2. If I am switching from 256 -> 16 tokens, does this cause any
>>hotspots by having the data concentrated to only a few nodes and not
>>distributing equally across all the nodes?
>>
>>
> This email, including its contents and any attachment(s), may contain
> confidential and/or proprietary information and is solely for the review
> and use of the intended recipient(s). If you have received this email in
> error, please notify the sender and permanently delete this email, its
> content, and any attachment(s). Any disclosure, copying, or taking of any
> action in reliance on an email received in error is strictly prohibited.
>


cell vs row timestamp tie resolution

2022-08-16 Thread Andrey Zapariy
Hello Cassandra users!

I'm dealing with the unexpected behavior of the tie resolution for the same
timestamp inserts. At least, unexpected for me.
The following simple repro under Cassandra 3.11.4 illustrates the question:

CREATE KEYSPACE the_test WITH replication = {'class': 'SimpleStrategy',
'replication_factor': '2'}  AND durable_writes = true;
CREATE TABLE the_test.case (id int, sort int, body text, size int, PRIMARY
KEY (id, sort)) WITH CLUSTERING ORDER BY (sort ASC);
INSERT INTO the_test.case (id, sort, body, size) VALUES (1, 2, 'foo foo',
7) USING TIMESTAMP 1660596312240;
INSERT INTO the_test.case (id, sort, body, size) VALUES (1, 2, 'flap flap',
9) USING TIMESTAMP 1660596312240;

After these two inserts I expect that either combination of <'foo foo',11>
or combination of <'flap flap',9> would survive.
But the select
select id, sort, body, size from the_test.case where id=1 and sort=2;
is giving rather uncomfortable result:
id | sort | body| size
+--+-+--
  1 |2 | foo foo |9
Essentially, showing that timestamp tie resolution is performed on per cell
basis, and not on row basis, as I was expecting.

My questions are:
Am I right about the way Cassandra does resolve timestamp ties?
Or is there a way to configure Cassandra to perform per row resolution?

Flushing data to sstables and dumping them, suggests that these inserts are
stored as rows. And, naively thinking, I hope there is a way to make the
whole row insert to survive.


Re: Cassandra 4.0 upgrade - Upgradesstables

2022-08-16 Thread Jai Bheemsen Rao Dhanwada
Thank you Erick,

> it is going to be single-threaded by default so it will take a while to
get through all the sstables on dense nodes
Is there any downside if the upgradesstables take longer (example 1-2
days), other than I/O?

Also when is the upgradesstable get triggered? after every node is
upgraded or it will kick in only when all the nodes in the cluster upgraded
to 4.0.x?

On Tue, Aug 16, 2022 at 2:12 AM Erick Ramirez 
wrote:

> As convenient as it is, there are a few caveats and it isn't a silver
> bullet. The automatic feature will only kick in if there are no other
> compactions scheduled. Also, it is going to be single-threaded by default
> so it will take a while to get through all the sstables on dense nodes.
>
> In contrast, you'll have a bit more control if you manually upgrade the
> sstables. For example, you can schedule the upgrade during low traffic
> periods so reads are not competing with compactions for IO. Cheers!
>
>>


Re: Cassandra 4.0 upgrade - Upgradesstables

2022-08-16 Thread Erick Ramirez
As convenient as it is, there are a few caveats and it isn't a silver
bullet. The automatic feature will only kick in if there are no other
compactions scheduled. Also, it is going to be single-threaded by default
so it will take a while to get through all the sstables on dense nodes.

In contrast, you'll have a bit more control if you manually upgrade the
sstables. For example, you can schedule the upgrade during low traffic
periods so reads are not competing with compactions for IO. Cheers!

>