Re: Re-adding Decommissioned node
Hi Mark, Please ensure that the node is not defined as seed node in the yaml. Seed nodes don't bootstrap. ThanksAnuj On Tue, Jun 27, 2017 at 9:56 PM, Mark Furlongwrote: I have a node that has been decommissioned and it showed ‘UL’, the data volume and the commitlogs have been removed, and I now want to add that node back into my ring. When I add this node, (bootstrap=true, start cassandra service) it comes back up in the ring as an existing node and shows as ‘UN’ instead of ‘UJ’. Why is this? It has no data. | Mark Furlong | | Sr. Database Administrator | | mfurl...@ancestry.com M: 801-859-7427 O: 801-705-7115 1300 W Traverse Pkwy Lehi, UT 84043 | | | | | | - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: Hints files are not getting truncated
Hi Meg, max_hint_window_in_ms =3 hrs means that if a node is down/unresponsive for more than 3 hrs, hints would not be stored for it any further until it becomes responsive again. It should not mean that already stored hints would be truncated after 3 hours. Regarding connection timeouts between DCs, please check your firewall settings and tcp settings on node. Firewall between the DC must not kill an idle connection which is still considered to be usable by Cassandra. Please see http://docs.datastax.com/en/archived/cassandra/2.0/cassandra/troubleshooting/trblshootIdleFirewall.html . In multi dc setup, documentation recommends to increase number of threads. You can try increasing it and check whether it improves the situation. ThanksAnuj On Tue, Jun 27, 2017 at 9:47 PM, Meg Marawrote: Hello, I am facing an issue with Hinted Handoff files in Cassandra v3.0.10. A DC1 node is storing large number of hints for DC2 nodes (we are facing connection timeout issues). The problem is that the hint files which are created on DC1 are not getting deleted after the 3 hour window. Hints are now being stored as flat files in the Cassandra home directory and I can see that old hints are being deleted but at a very slow pace. It still contains hints from May. max_hint_window_in_ms: 1080 max_hints_delivery_threads: 2 Why do you suppose this is happening? Any suggestions or recommendations would be much appreciated. Thanks for your time. Meg Mara
Re: Linux version update on DSE
Hi Nitan, I think it would be simpler to take one node down at a time and replace it by bringing the new node up after linux upgrade, doing same cassandra setup, using replace_address option and setting autobootstrap=false ( as data is already there). No downtime as it would be a rolling upgrade. No streaming as same tokens would work. If you have latest C*, use replace_address_first_boot. If option not available, use replace_address and make sure you remove it once new node is up. Try it and let us know if it works for you. ThanksAnuj On Tue, Jun 27, 2017 at 4:56 AM, Nitan Kainthwrote: Right, we are just upgrading Linux on AWS. C* will remain at same version. On Jun 26, 2017, at 6:05 PM, Hannu Kröger wrote: I understood he is updating linux, not C* Hannu On 27 June 2017 at 02:04:34, Jonathan Haddad (j...@jonhaddad.com) wrote: It sounds like you're suggesting adding new nodes in to replace existing ones. You can't do that because it requires streaming between versions, which isn't supported. You need to take a node down, upgrade the C* version, then start it back up. Jon On Mon, Jun 26, 2017 at 3:56 PM Nitan Kainth wrote: It's vnodes. We will add to replace new ip in yaml as well. Thank you. Sent from my iPhone > On Jun 26, 2017, at 4:47 PM, Hannu Kröger wrote: > > Looks Ok. Step 1.5 would be to stop cassandra on existing node but apart from > that looks fine. Assuming you are using same configs and if you have hard > coded the token(s), you use the same. > > Hannu > >> On 26 Jun 2017, at 23.24, Nitan Kainth wrote: >> >> Hi, >> >> We are planning to update linux for C* nodes version 3.0. Anybody has steps >> who did it recent past. >> >> Here are draft steps, we are thinking: >> 1. Create new node. It might have a different IP address. >> 2. Detach mounts from existing node >> 3. Attach mounts to new Node >> 4. Start C* - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: Partition range incremental repairs
Hi Chris, Using pr with incremental repairs does not make sense. Primary range repair is an optimization over full repair. If you run full repair on a n node cluster with RF=3, you would be repairing each data thrice. E.g. in a 5 node cluster with RF=3, a range may exist on node A,B and C . When full repair is run on node A, the entire data in that range gets synced with replicas on node B and C. Now, when you run full repair on nodes B and C, you are wasting resources on repairing data which is already repaired. Primary range repair ensures that when you run repair on a node, it ONLY repairs the data which is owned by the node. Thus, no node repairs data which is not owned by it and must be repaired by other node. Redundant work is eliminated. Even in pr, each time you run pr on all nodes, you repair 100% of data. Why to repair complete data in each cycle?? ..even data which has not even changed since the last repair cycle? This is where Incremental repair comes as an improvement. Once repaired, a data would be marked repaired so that the next repair cycle could just focus on repairing the delta. Now, lets go back to the example of 5 node cluster with RF =3.This time we run incremental repair on all nodes. When you repair entire data on node A, all 3 replicas are marked as repaired. Even if you run inc repair on all ranges on the second node, you would not re-repair the already repaired data. Thus, there is no advantage of repairing only the data owned by the node (primary range of the node). You can run inc repair on all the data present on a node and Cassandra would make sure that when you repair data on other nodes, you only repair unrepaired data. ThanksAnuj Sent from Yahoo Mail on Android On Tue, Jun 6, 2017 at 4:27 PM, Chris Stokesmorewrote: Hi all, Wondering if anyone had any thoughts on this? At the moment the long running repairs cause us to be running them on two nodes at once for a bit of time, which obivould increases the cluster load. On 2017-05-25 16:18 (+0100), Chris Stokesmore wrote: > Hi,> > > We are running a 7 node Cassandra 2.2.8 cluster, RF=3, and had been running > repairs with the -pr option, via a cron job that runs on each node once per > week.> > > We changed that as some advice on the Cassandra IRC channel said it would > cause more anticompaction and > http://docs.datastax.com/en/archived/cassandra/2.2/cassandra/tools/toolsRepair.html > says 'Performing partitioner range repairs by using the -pr option is > generally considered a good choice for doing manual repairs. However, this > option cannot be used with incremental repairs (default for Cassandra 2.2 and > later)' > > Only problem is our -pr repairs were taking about 8 hours, and now the non-pr > repair are taking 24+ - I guess this makes sense, repairing 1/7 of data > increased to 3/7, except I was hoping to see a speed up after the first loop > through the cluster as each repair will be marking much more data as > repaired, right?> > > > Is running -pr with incremental repairs really that bad? > - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: Partition range incremental repairs
Hi Chris, Can your share following info: 1. Exact repair commands you use for inc repair and pr repair 2. Repair time should be measured at cluster level for inc repair. So, whats the total time it takes to run repair on all nodes for incremental vs pr repairs? 3. You are repairing one dc DC3. How many DCs are there in total and whats the RF for keyspaces? Running pr on a specific dc would not repair entire data. 4. 885 ranges? From where did you get this number? Logs? Can you share the number ranges printed in logs for both inc and pr case? ThanksAnuj Sent from Yahoo Mail on Android On Tue, Jun 6, 2017 at 9:33 PM, Chris Stokesmore<chris.elsm...@demandlogic.co> wrote: Thank you for the excellent and clear description of the different versions of repair Anuj, that has cleared up what I expect to be happening. The problem now is in our cluster, we are running repairs with options (parallelism: parallel, primary range: false, incremental: true, job threads: 1, ColumnFamilies: [], dataCenters: [DC3], hosts: [], # of ranges: 885) and when we do our repairs are taking over a day to complete when previously when running with the partition range option they were taking more like 8-9 hours. As I understand it, using incremental should have sped this process up as all three sets of data on each repair job should be marked as repaired however this does not seem to be the case. Any ideas? Chris On 6 Jun 2017, at 16:08, Anuj Wadehra <anujw_2...@yahoo.co.in.INVALID> wrote: Hi Chris, Using pr with incremental repairs does not make sense. Primary range repair is an optimization over full repair. If you run full repair on a n node cluster with RF=3, you would be repairing each data thrice. E.g. in a 5 node cluster with RF=3, a range may exist on node A,B and C . When full repair is run on node A, the entire data in that range gets synced with replicas on node B and C. Now, when you run full repair on nodes B and C, you are wasting resources on repairing data which is already repaired. Primary range repair ensures that when you run repair on a node, it ONLY repairs the data which is owned by the node. Thus, no node repairs data which is not owned by it and must be repaired by other node. Redundant work is eliminated. Even in pr, each time you run pr on all nodes, you repair 100% of data. Why to repair complete data in each cycle?? ..even data which has not even changed since the last repair cycle? This is where Incremental repair comes as an improvement. Once repaired, a data would be marked repaired so that the next repair cycle could just focus on repairing the delta. Now, lets go back to the example of 5 node cluster with RF =3.This time we run incremental repair on all nodes. When you repair entire data on node A, all 3 replicas are marked as repaired. Even if you run inc repair on all ranges on the second node, you would not re-repair the already repaired data. Thus, there is no advantage of repairing only the data owned by the node (primary range of the node). You can run inc repair on all the data present on a node and Cassandra would make sure that when you repair data on other nodes, you only repair unrepaired data. ThanksAnuj Sent from Yahoo Mail on Android On Tue, Jun 6, 2017 at 4:27 PM, Chris Stokesmore<chris.elsm...@demandlogic.co> wrote: Hi all, Wondering if anyone had any thoughts on this? At the moment the long running repairs cause us to be running them on two nodes at once for a bit of time, which obivould increases the cluster load. On 2017-05-25 16:18 (+0100), Chris Stokesmore <c...@demandlogic.co> wrote: > Hi,> > > We are running a 7 node Cassandra 2.2.8 cluster, RF=3, and had been running > repairs with the -pr option, via a cron job that runs on each node once per > week.> > > We changed that as some advice on the Cassandra IRC channel said it would > cause more anticompaction and > http://docs.datastax.com/en/archived/cassandra/2.2/cassandra/tools/toolsRepair.html > says 'Performing partitioner range repairs by using the -pr option is > generally considered a good choice for doing manual repairs. However, this > option cannot be used with incremental repairs (default for Cassandra 2.2 and > later)' > > Only problem is our -pr repairs were taking about 8 hours, and now the non-pr > repair are taking 24+ - I guess this makes sense, repairing 1/7 of data > increased to 3/7, except I was hoping to see a speed up after the first loop > through the cluster as each repair will be marking much more data as > repaired, right?> > > > Is running -pr with incremental repairs really that bad? > - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: LWT and non-LWT mixed
Hi Daniel, What is the RF and CL for Delete?Are you using asynchronous writes?Are you firing both statements from same node sequentially?Are you firing these queries in a loop such that more than one delete and LWT is fired for same partition? I think if you have the same client executing both statements sequentially in same thread i.e. one after another and delete is synchronous, it should work fine. LWT will be executed after Cassandra has written on Quorum of nodes and will see the data. Paxos of LWT shall only be initiated when delete completes. I think, LWT should not be mixed with normal write when you have such writes fired from multiple nodes/threads on the same partition. ThanksAnuj Sent from Yahoo Mail on Android On Tue, 10 Oct 2017 at 14:10, Daniel Woowrote: The document explains you cannot mix themhttp://docs.datastax.com/en/archived/cassandra/2.2/cassandra/dml/dmlLtwtTransactions.html But what happens under the hood if I do? e.g, DELETE INSERT ... IF NOT EXISTS The coordinator has 4 steps to do the second statement (INSERT)1. prepare/promise a ballot2. read current row from replicas3. propose new value along with the ballot to replicas4. commit and wait for ack from replicas My question is, once the row is DELETed, the next INSERT LWT should be able to see that row's tombstone in step 2, then successfully inserts the new value. But my tests shows that this often fails, does anybody know why? -- Thanks & Regards, Daniel
Cassandra Upgrade with Different Protocol Version
Hi, I woud like to know how people are doing rolling upgrade of Casandra clustes when there is a change in native protocol version say from 2.1 to 3.11. During rolling upgrade, if client application is restarted on nodes, the client driver may first contact an upgraded Cassandra node with v4 and permanently mark all old Casandra nodes on v3 as down. This may lead to request failures. Datastax recommends two ways to deal with this: 1. Before upgrade, set protocol version to lower protocol version. And move to higher version once entire cluster is upgraded.2. Make sure driver only contacts upraded Cassandra nodes during rolling upgrade. Second workaround will lead to failures as you may not be able to meet required consistency for some time. Lets consider first workaround. Now imagine an application where protocol version is not configurable and code uses default protocol version. You can not apply first workaroud because you have to upgrade your application on all nodes to first make the protocol version configurable. How would you upgrade such a cluster without downtime? Thoughts? ThanksAnuj
Re: [External] Re: Whch version is the best version to run now?
We evaluated both 3.0.x and 3.11.x. +1 for 3.11.2 as we faced major performance issues with 3.0.x. We have NOT evaluated new features on 3.11.x. Anuj Sent from Yahoo Mail on Android On Tue, 6 Mar 2018 at 19:35, Alain RODRIGUEZwrote: Hello Tom, It's good to hear this kind of feedbacks, Thanks for sharing. 3.11.x seems to get more love from the community wrt patches. This is why I'd recommend 3.11.x for new projects. I also agree with this analysis. Stay away from any of the 2.x series, they're going EOL soonish and the newer versions are very stable. +1 here as well. Maybe add that 3.11.x, that is described as 'very stable' above, aims at stabilizing Cassandra after the tick-tock releases and is a 'bug fix' series and brings features developed during this period, even though it is needed to be careful with of some the new features, even in latest 3.11.x versions. I did not work that much with it yet, but I think I would pick 3.11.2 as well for a new cluster at the moment. C*heers, ---Alain Rodriguez - @arodream - alain@thelastpickle.comFrance / Spain The Last Pickle - Apache Cassandra Consultinghttp://www.thelastpickle.com 2018-03-05 12:39 GMT+00:00 Tom van der Woerdt : We run on the order of a thousand Cassandra nodes in production. Most of that is 3.0.16, but new clusters are defaulting to 3.11.2 and some older clusters have been upgraded to it as well. All of the bugs I encountered in 3.11.x were also seen in 3.0.x, but 3.11.x seems to get more love from the community wrt patches. This is why I'd recommend 3.11.x for new projects. Stay away from any of the 2.x series, they're going EOL soonish and the newer versions are very stable. Tom van der WoerdtSite Reliability Engineer Booking.com B.V. Vijzelstraat 66-80 Amsterdam 1017HL NetherlandsThe world's #1 accommodation site 43 languages, 198+ offices worldwide, 120,000+ global destinations, 1,550,000+ room nights booked every day No booking fees, best price always guaranteed Subsidiary of Booking Holdings Inc. (NASDAQ: BKNG) On Sat, Mar 3, 2018 at 12:25 AM, Jeff Jirsa wrote: I’d personally be willing to run 3.0.16 3.11.2 or 3 whatever should also be similar, but I haven’t personally tested it at any meaningful scale -- Jeff Jirsa On Mar 2, 2018, at 2:37 PM, Kenneth Brotman wrote: Seems like a lot of people are running old versions of Cassandra. What is the best version, most reliable stable version to use now? Kenneth Brotman
Re: Upgrade to v3.11.3
Hi Shalom, Just a suggestion. Before upgrading to 3.11.3 make sure you are not impacted by any open crtitical defects especially related to RT which may cause data loss e.g.14861. Please find my response below: The upgrade process that I know of is from 2.0.14 to 2.1.x (higher than 2.1.9 I think) and then from 2.1.x to 3.x. Do I need to upgrade first to 3.0.x or can I upgraded directly from 2.1.x to 3.11.3? Response: Yes, you can upgrade from 2.0.14 to some latest stable version of 2.1.x (only 2.1.9+) and then upgrade to 3.11.3. Can I run upgradesstables on several nodes in parallel? Is it crucial to run it one node at a time? Response: Yes, you can run in parallel. When running upgradesstables on a node, does that node still serves writes and reads? Response: Yes. Can I use open JDK 8 (instead of Oracle JDK) with C* 3.11.3? Response: We have not tried but it should be okay. See https://issues.apache.org/jira/plugins/servlet/mobile#issue/CASSANDRA-13916. Is there a way to speed up the upgradesstables process? (besides compaction_throughput) Response: If clearing pending compactions caused by rewriting sstable is a concern,probably you can also try increasing concurrent compactors. Disclaimer: The information provided in above response is my personal opinion based on the best of my knowledge and experience. We do not take any responsibility and we are not liable for any damage caused by actions taken based on above information. ThanksAnuj On Wed, 16 Jan 2019 at 19:15, shalom sagges wrote: Hi All, I'm about to start a rolling upgrade process from version 2.0.14 to version 3.11.3. I have a few small questions: - The upgrade process that I know of is from 2.0.14 to 2.1.x (higher than 2.1.9 I think) and then from 2.1.x to 3.x. Do I need to upgrade first to 3.0.x or can I upgraded directly from 2.1.x to 3.11.3? - Can I run upgradesstables on several nodes in parallel? Is it crucial to run it one node at a time? - When running upgradesstables on a node, does that node still serves writes and reads? - Can I use open JDK 8 (instead of Oracle JDK) with C* 3.11.3? - Is there a way to speed up the upgradesstables process? (besides compaction_throughput) Thanks!
ApacheCon Europe 2019
Hi, Do we have any plans for dedicated Apache Cassandra track or sessions at ApacheCon Berlin in Oct 2019? CFP closes 26 May, 2019. ThanksAnuj Wadehra