Re: inter dc bandwidth calculation
Hello, just as a small addition: The numbers also depend on your consistency level used for reads. It will behave like that if you just read on local nodes. If you do reads on ALL, QUORUM or EACH_QUORUM etc. you need also include the read volume in the calculation. Regards, Georg Am Mi., 15. Jan. 2020 um 19:35 Uhr schrieb Osman Yozgatlıoğlu < osman.yozgatlio...@gmail.com>: > Thank you. I have an insight now. > > Regards, > Osman > > On Wed, 15 Jan 2020 at 19:18, Reid Pinchback > wrote: > > > > Oh, duh. Revise that. I was forgetting that multi-dc writes are sent > to a single node in the other dc and tagged to be forwarded to other nodes > within the dc. > > > > So your quick-and-dirty estimate would be more like (write volume) x 2 > to leave headroom for random other mechanics. > > > > R > > > > > > On 1/15/20, 11:07 AM, "Reid Pinchback" > wrote: > > > > Message from External Sender > > > > I would think that it would be largely driven by the replication > factor. It isn't that the sstables are forklifted from one dc to another, > it's just that the writes being made to the memtables are also shipped > around by the coordinator nodes as the writes happen. Operations at the > sstable level, like compactions, are local to the node. > > > > One potential wrinkle that I'm unclear on, is related to repairs. I > don't know if merkle trees are biased to mostly bounce around only > intra-dc, versus how often they are communicated inter-dc. Note that even > queries can trigger some degree of repair traffic if you have a usage > pattern of trying to read data recently written, because at the bleeding > edge of the recent changes you'll have more cases of rows not having had > time to settle to a consistent state. > > > > If you want a quick-and-dirty heuristic, I'd probably take (write > volume) x (replication factor) x 2 as a guestimate so you have some > headroom for C* and TCP mechanics, but then monitor to see what your real > use is. > > > > R > > > > > > On 1/15/20, 4:14 AM, "Osman Yozgatlıoğlu" < > osman.yozgatlio...@gmail.com> wrote: > > > > Message from External Sender > > > > Hello, > > > > Is there any way to calculate inter dc bandwidth requirements for > > proper operation? > > I can't find any info about this subject. > > Can we say, how much sstable collected at one dc has to be > transferred to other? > > I can calculate bandwidth with generated sstable then. > > I have twcs with one hour window. > > > > Regards, > > Osman > > > > > - > > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org > > For additional commands, e-mail: user-h...@cassandra.apache.org > > > > > > > > > > > > - > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org > For additional commands, e-mail: user-h...@cassandra.apache.org > >
Re: How to read content of hints file and apply them manually?
We tried to tune sethintedhandoffthrottlekb to 100 , 1024 , 10240 but nothing helped . Our hints related parameters are as below, if you don't find any parameter below then it is not set in our environment and should be of the default value. max_hint_window_in_ms: 1080 # 3 hours hinted_handoff_enabled: true hinted_handoff_throttle_in_kb: 100 max_hints_delivery_threads: 8 hints_directory: /var/lib/cassandra/hints hints_flush_period_in_ms: 1 max_hints_file_size_in_mb: 128 On Mon, 27 Jan 2020 at 18:34, Jeff Jirsa wrote: > > The high cpu is probably the hints getting replayed slamming the write path > > Slowing it down with the hint throttle may help > > It’s not instant. > > On Jan 27, 2020, at 6:05 PM, Erick Ramirez wrote: > > > >> Increase the max_hint_window_in_ms setting in cassandra.yaml to more than >> 3 hours, perhaps 6 hours. If the issue still persists networking may need >> to be tested for bandwidth issues. >> > > Just a note of warning about bumping up the hint window without > understanding the pros and cons. Be aware that doubling it means: > >- you'll end up doubling the size of stored hints in >the hints_directory >- there'll be twice as much hints to replay when node(s) come back >online > > There's always 2 sides to fiddling with the knobs in C*. Cheers! > >
Re: How to read content of hints file and apply them manually?
The high cpu is probably the hints getting replayed slamming the write path Slowing it down with the hint throttle may help It’s not instant. > On Jan 27, 2020, at 6:05 PM, Erick Ramirez wrote: > > >> Increase the max_hint_window_in_ms setting in cassandra.yaml to more than 3 >> hours, perhaps 6 hours. If the issue still persists networking may need to >> be tested for bandwidth issues. > > Just a note of warning about bumping up the hint window without understanding > the pros and cons. Be aware that doubling it means: > you'll end up doubling the size of stored hints in the hints_directory > there'll be twice as much hints to replay when node(s) come back online > There's always 2 sides to fiddling with the knobs in C*. Cheers!
Re: How to read content of hints file and apply them manually?
> > Increase the max_hint_window_in_ms setting in cassandra.yaml to more than > 3 hours, perhaps 6 hours. If the issue still persists networking may need > to be tested for bandwidth issues. > Just a note of warning about bumping up the hint window without understanding the pros and cons. Be aware that doubling it means: - you'll end up doubling the size of stored hints in the hints_directory - there'll be twice as much hints to replay when node(s) come back online There's always 2 sides to fiddling with the knobs in C*. Cheers!
Re: new node stops streaming..
You can increase the max number of open files on the new node. We find that 65K is too low for most production clusters and you can bump it up to 100 or 200K. We generally recommend 1 million but YMMV: - nofile 1048576 On Tue, Jan 28, 2020 at 11:55 AM Eunsu Kim wrote: > Hi experts > > I had a problem adding a new node. > > Joining node in datacenterA stops streaming while joining. So it keeps the > UJ. > (datacenterB is fine.) > > I try 'nodetool netstats' on a stopped node and it looks like this: > > Mode: JOINING > Not sending any streams. > Read Repair Statistics: > Attempted: 0 > Mismatch (Blocking): 0 > Mismatch (Background): 0 > > When I try 'nodetool rebuild' it changes to the following but no steaming > occurs. > > Mode: JOINING > Rebuild 1df64590-4166-11ea-86a0-4b3cc5e92e4a > Read Repair Statistics: > Attempted: 0 > Mismatch (Blocking): 0 > Mismatch (Background): 0 > > I think this is related to the number of open file descriptors. > > > Incoming Streaming Bytes went to zero after the number of open file > descriptors reached the host's MAX (65536). > Since then, the number of open file descriptors has decreased, but > steaming has not resumed. > > And when I drop that joining process, it automatically was removed from > the cluster. > > What should I do to add nodes to this data center in this case? > > Please advice. > > Thank you. >
Re: How to read content of hints file and apply them manually?
Surbhi, The hints could be getting accumulated for one or both of the following reasons: - Some node is becoming unavailable very routinely, which is unlikely- The hints are getting replayed very slowly due to network bandwidth issues, which is more likely Increase the max_hint_window_in_ms setting in cassandra.yaml to more than 3 hours, perhaps 6 hours. If the issue still persists networking may need to be tested for bandwidth issues. regards,DeepakOn Tuesday, January 28, 2020, 01:01:51 a.m. UTC, Surbhi Gupta wrote: Why we think it might be related to hints is , because if we truncate the hints then load goes normal on the nodes.FYI , We had to run repair after truncating hints. Any thoughts ? On Mon, 27 Jan 2020 at 15:27, Deepak Vohra wrote: Hints are a stopgap measure and not a fix to the underlying issue. Run a full repair.On Monday, January 27, 2020, 10:17:01 p.m. UTC, Surbhi Gupta wrote: Hi, We are on Open source 3.11 .We have a issue in one of the cluster where lots of hints gets piled up and they don't get applied within hinted handoff period ( 3 hour in our case) . And load and CPU of the server goes very high.We see lot of messages in system.log and debug.log . Our read repair chance and dc_local_repair chance is 0.1 . Any pointers are welcome . ERROR [ReadRepairStage:83] 2020-01-27 13:08:43,695 CassandraDaemon.java:228 - Exception in thread Thread[ReadRepairStage:83,5,main] org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only 0 responses. DEBUG [ReadRepairStage:111] 2020-01-27 13:10:06,663 ReadCallback.java:242 - Digest mismatch: org.apache.cassandra.service.DigestMismatchException: Mismatch for key DecoratedKey(4759131696153881383, 9a21276d0af64de28d5d3023b69e) (142a55e1e28de7daa2ddc34a361 474a0 vs fcba30f022ef25f456914c341022963d)
Re: Uneven token distribution with allocate_tokens_for_keyspace
Hi Leo, The token assignment for each node in the cluster must be unique regardless of the datacenter they are in. This is because the range of tokens available to assign to nodes is per cluster. Token allocation is performed per node at a global level. A datacenter helps define the way data is replicated and has no influence on how tokens are assigned to nodes. For example, if a new node is assigned one or more of the tokens already owned by another node in the cluster, the new node will take ownership of those tokens. This will happen regardless of which datacenter either node is in. Regards, Anthony On Sat, 25 Jan 2020 at 02:11, Léo FERLIN SUTTON wrote: > Hi Anthony ! > > I have a follow-up question : > > Check to make sure that no other node in the cluster is assigned any of >> the four tokens specified above. If there is another node in the cluster >> that is assigned one of the above tokens, increment the conflicting token >> by values of one until no other node in the cluster is assigned that token >> value. The idea is to make sure that these four tokens are unique to the >> node. > > > I don't understand this part of the process. Why do tokens conflict if the > nodes owning them are in a different datacenter ? > > Regards, > > Leo > > On Thu, Dec 5, 2019 at 1:00 AM Anthony Grasso > wrote: > >> Hi Enrico, >> >> Glad to hear the problem has been resolved and thank you for the feedback! >> >> Kind regards, >> Anthony >> >> On Mon, 2 Dec 2019 at 22:03, Enrico Cavallin >> wrote: >> >>> Hi Anthony, >>> thank you for your hints, now the new DC is well balanced within 2%. >>> I did read your article, but I thought it was needed only for new >>> "clusters", not also for new "DCs"; but RF is per DC so it makes sense. >>> >>> You TLP guys are doing a great job for Cassandra community. >>> >>> Thank you, >>> Enrico >>> >>> >>> On Fri, 29 Nov 2019 at 05:09, Anthony Grasso >>> wrote: >>> Hi Enrico, This is a classic chicken and egg problem with the allocate_tokens_for_keyspace setting. The allocate_tokens_for_keyspace setting uses the replication factor of a DC keyspace to calculate the token allocation when a node is added to the cluster for the first time. Nodes need to be added to the new DC before we can replicate the keyspace over to it. Herein lies the problem. We are unable to use allocate_tokens_for_keyspace unless the keyspace is replicated to the new DC. In addition, as soon as you change the keyspace replication to the new DC, new data will start to be written to it. To work around this issue you will need to do the following. 1. Decommission all the nodes in the *dcNew*, one at a time. 2. Once all the *dcNew* nodes are decommissioned, wipe the contents in the *commitlog*, *data*, *saved_caches*, and *hints* directories of these nodes. 3. Make the first node to add into the *dcNew* a seed node. Set the seed list of the first node with its IP address and the IP addresses of the other seed nodes in the cluster. 4. Set the *initial_token* setting for the first node. You can calculate the values using the algorithm in my blog post: https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html. For convenience I have calculated them: *-9223372036854775808,-4611686018427387904,0,4611686018427387904*. Note, remove the *allocate_tokens_for_keyspace* setting from the *cassandra.yaml* file for this (seed) node. 5. Check to make sure that no other node in the cluster is assigned any of the four tokens specified above. If there is another node in the cluster that is assigned one of the above tokens, increment the conflicting token by values of one until no other node in the cluster is assigned that token value. The idea is to make sure that these four tokens are unique to the node. 6. Add the seed node to cluster. Make sure it is listed in *dcNew *by checking nodetool status. 7. Create a dummy keyspace in *dcNew* that has a replication factor of 2. 8. Set the *allocate_tokens_for_keyspace* value to be the name of the dummy keyspace for the other two nodes you want to add to *dcNew*. Note remove the *initial_token* setting for these other nodes. 9. Set *auto_bootstrap* to *false* for the other two nodes you want to add to *dcNew*. 10. Add the other two nodes to the cluster, one at a time. 11. If you are happy with the distribution, copy the data to *dcNew* by running a rebuild. Hope this helps. Regards, Anthony On Fri, 29 Nov 2019 at 02:08, Enrico Cavallin < cavallin.enr...@gmail.com> wrote: > Hi all, > I have an old datacenter with 4 nodes and 256
Re: How to read content of hints file and apply them manually?
Why we think it might be related to hints is , because if we truncate the hints then load goes normal on the nodes. FYI , We had to run repair after truncating hints. Any thoughts ? On Mon, 27 Jan 2020 at 15:27, Deepak Vohra wrote: > > Hints are a stopgap measure and not a fix to the underlying issue. Run a > full repair. > On Monday, January 27, 2020, 10:17:01 p.m. UTC, Surbhi Gupta < > surbhi.gupt...@gmail.com> wrote: > > > Hi, > > We are on Open source 3.11 . > We have a issue in one of the cluster where lots of hints gets piled up > and they don't get applied within hinted handoff period ( 3 hour in our > case) . > And load and CPU of the server goes very high. > We see lot of messages in system.log and debug.log . Our read repair > chance and dc_local_repair chance is 0.1 . Any pointers are welcome . > > ERROR [ReadRepairStage:83] 2020-01-27 13:08:43,695 > CassandraDaemon.java:228 - Exception in thread > Thread[ReadRepairStage:83,5,main] > > org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out > - received only 0 responses. > > DEBUG [ReadRepairStage:111] 2020-01-27 13:10:06,663 ReadCallback.java:242 > - Digest mismatch: > > org.apache.cassandra.service.DigestMismatchException: Mismatch for key > DecoratedKey(4759131696153881383, 9a21276d0af64de28d5d3023b69e) > (142a55e1e28de7daa2ddc34a361 > > 474a0 vs fcba30f022ef25f456914c341022963d) >
new node stops streaming..
Hi experts I had a problem adding a new node. Joining node in datacenterA stops streaming while joining. So it keeps the UJ. (datacenterB is fine.) I try 'nodetool netstats' on a stopped node and it looks like this: Mode: JOINING Not sending any streams. Read Repair Statistics: Attempted: 0 Mismatch (Blocking): 0 Mismatch (Background): 0 When I try 'nodetool rebuild' it changes to the following but no steaming occurs. Mode: JOINING Rebuild 1df64590-4166-11ea-86a0-4b3cc5e92e4a Read Repair Statistics: Attempted: 0 Mismatch (Blocking): 0 Mismatch (Background): 0 I think this is related to the number of open file descriptors. Incoming Streaming Bytes went to zero after the number of open file descriptors reached the host's MAX (65536). Since then, the number of open file descriptors has decreased, but steaming has not resumed. And when I drop that joining process, it automatically was removed from the cluster. What should I do to add nodes to this data center in this case? Please advice. Thank you.
Re: How to read content of hints file and apply them manually?
Hints are a stopgap measure and not a fix to the underlying issue. Run a full repair.On Monday, January 27, 2020, 10:17:01 p.m. UTC, Surbhi Gupta wrote: Hi, We are on Open source 3.11 .We have a issue in one of the cluster where lots of hints gets piled up and they don't get applied within hinted handoff period ( 3 hour in our case) . And load and CPU of the server goes very high.We see lot of messages in system.log and debug.log . Our read repair chance and dc_local_repair chance is 0.1 . Any pointers are welcome . ERROR [ReadRepairStage:83] 2020-01-27 13:08:43,695 CassandraDaemon.java:228 - Exception in thread Thread[ReadRepairStage:83,5,main] org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only 0 responses. DEBUG [ReadRepairStage:111] 2020-01-27 13:10:06,663 ReadCallback.java:242 - Digest mismatch: org.apache.cassandra.service.DigestMismatchException: Mismatch for key DecoratedKey(4759131696153881383, 9a21276d0af64de28d5d3023b69e) (142a55e1e28de7daa2ddc34a361 474a0 vs fcba30f022ef25f456914c341022963d)
Re: How to read content of hints file and apply them manually?
There isn't a tool that I'm aware of that's readily available to do that. Your best bet is to run a regular repair. But really, hints are just a side-issue of a much wider problem and that is the nodes are overloaded. Is your application getting hit with a much higher than expected traffic? The screenshots you posted show that even read-repairs aren't getting responses from replicas. You should really address the overload issue. Cheers! >
How to read content of hints file and apply them manually?
Hi, We are on Open source 3.11 . We have a issue in one of the cluster where lots of hints gets piled up and they don't get applied within hinted handoff period ( 3 hour in our case) . And load and CPU of the server goes very high. We see lot of messages in system.log and debug.log . Our read repair chance and dc_local_repair chance is 0.1 . Any pointers are welcome . ERROR [ReadRepairStage:83] 2020-01-27 13:08:43,695 CassandraDaemon.java:228 - Exception in thread Thread[ReadRepairStage:83,5,main] org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only 0 responses. DEBUG [ReadRepairStage:111] 2020-01-27 13:10:06,663 ReadCallback.java:242 - Digest mismatch: org.apache.cassandra.service.DigestMismatchException: Mismatch for key DecoratedKey(4759131696153881383, 9a21276d0af64de28d5d3023b69e) (142a55e1e28de7daa2ddc34a361 474a0 vs fcba30f022ef25f456914c341022963d)
Re: [EXTERNAL] Re: sstableloader & num_tokens change
Odd. Have you seen this behavior? I ran a test last week, loaded snapshots from 4 nodes to 4 nodes (RF 3 on both ends) and did not notice a spike. That's not to say that it didn't happen, but I think I'd have noticed as I was loading approx 250GB x 4 (although sequentially rather than 4x sstableloader in parallel). Also, thanks to everyone for confirming no issue with num_tokens and sstableloader; appreciate it. On Mon, Jan 27, 2020 at 9:02 AM Durity, Sean R wrote: > I would suggest to be aware of potential data size expansion. If you load > (for example) three copies of the data into a new cluster (because the RF > of the origin cluster is 3), it will also get written to the RF of the new > cluster (3 more times). So, you could see data expansion of 9x the original > data size (or, origin RF * target RF), until compaction can run. > > > > > > Sean Durity – Staff Systems Engineer, Cassandra > > > > *From:* Erick Ramirez > *Sent:* Friday, January 24, 2020 11:03 PM > *To:* user@cassandra.apache.org > *Subject:* [EXTERNAL] Re: sstableloader & num_tokens change > > > > > > If I may just loop this back to the question at hand: > > I'm curious if there are any gotchas with using sstableloader to restore > snapshots taken from 256-token nodes into a cluster with 32-token (or your > preferred number of tokens) nodes (otherwise same # of nodes and same RF). > > > > No, there isn't. It will work as designed so you're good to go. Cheers! > > > > > > > -- > > The information in this Internet Email is confidential and may be legally > privileged. It is intended solely for the addressee. Access to this Email > by anyone else is unauthorized. If you are not the intended recipient, any > disclosure, copying, distribution or any action taken or omitted to be > taken in reliance on it, is prohibited and may be unlawful. When addressed > to our clients any opinions or advice contained in this Email are subject > to the terms and conditions expressed in any applicable governing The Home > Depot terms of business or client engagement letter. The Home Depot > disclaims all responsibility and liability for the accuracy and content of > this attachment and for any damages or losses arising from any > inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other > items of a destructive nature, which may be contained in this attachment > and shall not be liable for direct, indirect, consequential or special > damages in connection with this e-mail message or its attachment. >
RE: [EXTERNAL] Re: sstableloader & num_tokens change
I would suggest to be aware of potential data size expansion. If you load (for example) three copies of the data into a new cluster (because the RF of the origin cluster is 3), it will also get written to the RF of the new cluster (3 more times). So, you could see data expansion of 9x the original data size (or, origin RF * target RF), until compaction can run. Sean Durity – Staff Systems Engineer, Cassandra From: Erick Ramirez Sent: Friday, January 24, 2020 11:03 PM To: user@cassandra.apache.org Subject: [EXTERNAL] Re: sstableloader & num_tokens change If I may just loop this back to the question at hand: I'm curious if there are any gotchas with using sstableloader to restore snapshots taken from 256-token nodes into a cluster with 32-token (or your preferred number of tokens) nodes (otherwise same # of nodes and same RF). No, there isn't. It will work as designed so you're good to go. Cheers! The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment.
Re: sstableloader & num_tokens change
Hello Concerning the original question, I agreed with @eric_ramirez, sstableloader is transparent for token allocation number. just for info @voytek, check this post out https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html You lay be interested to now if you have your cluster well balanced with 32 tokens. 32 tokens seems to be the future default value, but changing the default vnodes token numbers seems not to be so straight forward cheers Jean Carlo "The best way to predict the future is to invent it" Alan Kay On Sat, Jan 25, 2020 at 5:05 AM Erick Ramirez wrote: > On the subject of DSBulk, sstableloader is the tool of choice for this > scenario. > > +1 to Sergio and I'm confirming that DSBulk is designed as a bulk loader > for CSV/JSON formats. Cheers! >