RE: [EXTERNAL] Re: Upgrade strategy for high number of nodes

Durity, Sean R Mon, 02 Dec 2019 07:23:33 -0800

All my upgrades are without downtime for the application. Yes, do the binary 
upgrade one node at a time. Then run upgradesstables on as many nodes as your 
app load can handle (maybe you can point the app to a different DC, while 
another DC is doing upgradesstables). Upgradesstables doesn’t cause downtime – 
it just increases the IO load on the nodes executing the upgradesstables. I try 
to get it done as quickly as possible, because I suspend streaming operations 
(repairs, etc.) until the sstable rewrites are completed.

Sean Durity

From: Shishir Kumar <shishirroy2...@gmail.com>
Sent: Saturday, November 30, 2019 1:00 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Upgrade strategy for high number of nodes

Thanks for pointer. We haven't much changed data model since long, so before 
workarounds (scrub) worth understanding root cause of problem.
This might be reason why running upgradesstables in parallel was not 
recommended.
-Shishir
On Sat, 30 Nov 2019, 10:37 Jeff Jirsa, 
<jji...@gmail.com<mailto:jji...@gmail.com>> wrote:
Scrub really shouldn’t be required here.

If there’s ever a step that reports corruption, it’s either a very very old 
table where you dropped columns previously or did something “wrong” in the past 
or a software bug. The old dropped column really should be obvious in the stack 
trace - anything else deserves a bug report.

It’s unfortunate that people jump to just scrubbing the unreadable data - would 
appreciate an anonymized JIRA if possible. Alternatively work with your vendor 
to make sure they don’t have bugs in their readers somehow.

On Nov 29, 2019, at 8:58 PM, Shishir Kumar 
<shishirroy2...@gmail.com<mailto:shishirroy2...@gmail.com>> wrote:

Some more background. We are planning (tested) binary upgrade across all nodes 
without downtime. As next step running upgradesstables. As C* file format and 
version (from format big, version mc to format bti, version aa (Refer 
https://docs.datastax.com/en/dse/6.0/dse-admin/datastax_enterprise/tools/toolsSStables/ToolsSSTableupgrade.html

[docs.datastax.com]<https://urldefense.com/v3/__https:/docs.datastax.com/en/dse/6.0/dse-admin/datastax_enterprise/tools/toolsSStables/ToolsSSTableupgrade.html__;!OYIaWQQGbnA!b3wl1RjHA154C1fTebb3XaGb-qv-TzUsDDGFAIARuGPIciZp9xo7o4towtZAXGgJtc-QnA0$>
 - upgrade from DSE 5.1 to 6.x). Underlying changes explains why it takes too 
much time to upgrade.
Running  upgradesstables  in parallel across RAC - This is where I am not sure 
on impact of running in parallel (document recommends to run one node at time). 
During upgradesstables there are scenario's where it report file corruption, 
hence require corrective step I.e. scrub. Due to file corruption at times nodes 
goes down due to sstable corruption or result in high CPU usage ~100%. 
Performing above in parallel without downtime might result in more 
inconsistency across nodes. This scenario have not tested, so will need group 
help in case they have done similar upgrade in past (I.e. scenario's/complexity 
which needs to be considered and why guideline recommend to run upgradesstable 
one node at time).
-Shishir

On Fri, Nov 29, 2019 at 11:52 PM Josh Snyder 
<j...@code406.com<mailto:j...@code406.com>> wrote:
Hello Shishir,

It shouldn't be necessary to take downtime to perform upgrades of a Cassandra 
cluster. It sounds like the biggest issue you're facing is the upgradesstables 
step. upgradesstables is not strictly necessary before a Cassandra node 
re-enters the cluster to serve traffic; in my experience it is purely for 
optimizing the performance of the database once the software upgrade is 
complete. I recommend trying out an upgrade in a test environment without using 
upgradesstables, which should bring the 5 hours per node down to just a few 
minutes.

If you're running NetworkTopologyStrategy and you want to optimize further, you 
could consider performing the upgrade on multiple nodes within the same rack in 
parallel. When correctly configured, NetworkTopologyStrategy can protect your 
database from an outage of an entire rack. So performing an upgrade on a few 
nodes at a time within a rack is the same as a partial rack outage, from the 
database's perspective.

Have a nice upgrade!

Josh

On Fri, Nov 29, 2019 at 7:22 AM Shishir Kumar 
<shishirroy2...@gmail.com<mailto:shishirroy2...@gmail.com>> wrote:
Hi,

Need input on cassandra upgrade strategy for below:
1. We have Datacenter across 4 geography (multiple isolated deployments in each 
DC).
2. Number of Cassandra nodes in each deployment is between 6 to 24
3. Data volume on each nodes between 150 to 400 GB
4. All production environment has DR set up
5. During upgrade we do not want downtime

We are planning to go for stack upgrade but upgradesstables is taking approx. 5 
hours per node (if data volume is approx 200 GB).
Options-
No downtime - As per recommendation (DataStax documentation) if we plan to 
upgrade one node at time I.e. in sequence upgrade cycle for one environment 
will take weeks, so DevOps concern.
Read Only (No downtime) - Route read only load to DR system. We have resilience 
built up to take care of mutation scenarios. But incase it takes more than say 
3-4 hours, there will be long catch up exercise. Maintenance cost seems too 
high due to unknowns
Downtime- Can upgrade all nodes in parallel as no live customers. This has 
direct Customer impact, so need to convince on maintenance cost vs customer 
impact.
Please suggest how other Organisation are solving this scenario (whom have 100+ 
nodes)

Regards
Shishir

________________________________

The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.

RE: [EXTERNAL] Re: Upgrade strategy for high number of nodes

Reply via email to