[ https://issues.apache.org/jira/browse/CASSANDRA-11170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16671235#comment-16671235 ]
Matt Stump edited comment on CASSANDRA-11170 at 11/1/18 8:06 AM: ----------------------------------------------------------------- We've seen this issue result in a large volume of dropped NTR requests in a couple of customer clusters resulting in hints accumulating for nodes in other DCs. The distinguishing characteristics is that they write primarily in 1 DC and rely on replication to ship traffic to other DCs. The writes in the local DC are evenly distributed and fast. When these writes are shipped to the remote DC they're being sent to 1/3 of the nodes which results in the NTR queue being overwhelmed, requests failing and mutations hinted. In some of the accounts enabling cross-DC read repair mitigate most of the felt effects, but that does introduce lesser side effects. Patch for random remote coordinator selection is attached. I also improved the trace log message for this behavior to make it more clear which node is acting as remote coordinator vs forward replica. was (Author: mstump): We've seen this issue result in a large volume of dropped NTR requests in a couple of customer clusters resulting in hints accumulating for nodes in other DCs. The distinguishing characteristics is that they write primarily in 1 DC and rely on replication to ship traffic to other DCs. The writes in the local DC are evenly distributed and fast. When these writes are shipped to the remote DC they're being sent to 1/3 of the nodes which results in the NTR queue being overwhelmed, requests failing and mutations hinted. In some of the accounts enabling cross-DC read repair mitigate most of the felt effects, but that does introduce lesser side effects. > Uneven load can be created by cross DC mutation propagations, as remote > forwarding node is not randomly picked > -------------------------------------------------------------------------------------------------------------- > > Key: CASSANDRA-11170 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11170 > Project: Cassandra > Issue Type: Improvement > Components: Coordination > Reporter: Wei Deng > Assignee: Wei Deng > Priority: Major > Attachments: 11170.patch > > > I was looking at the o.a.c.service.StorageProxy code and realized that it > seems to be always picking the first IP in the remote DC target list as the > destination, whenever it needs to send the mutation to a remote DC. See these > lines in the code: > https://github.com/apache/cassandra/blob/1944bf507d66b5c103c136319caeb4a9e3767a69/src/java/org/apache/cassandra/service/StorageProxy.java#L1280-L1301 > This could cause one node in the remote DC receiving more mutation messages > than the other nodes, and hence uneven workload distribution. > A trivial test (with TRACE logging level enabled) on a 3+3 node cluster > proved the problem, see the system.log entries below: > {code} > INFO [RMI TCP Connection(18)-54.173.227.52] 2016-02-13 09:54:55,948 > StorageService.java:3353 - set log level to TRACE for classes under > 'org.apache.cassandra.service.StorageProxy' (if the level doesn't look like > 'TRACE' then the logger couldn't parse 'TRACE') > TRACE [SharedPool-Worker-1] 2016-02-13 09:55:15,148 StorageProxy.java:1284 - > Adding FWD message to 8996@/52.53.215.74 > TRACE [SharedPool-Worker-1] 2016-02-13 09:55:15,149 StorageProxy.java:1284 - > Adding FWD message to 8997@/54.183.23.201 > TRACE [SharedPool-Worker-1] 2016-02-13 09:55:15,149 StorageProxy.java:1289 - > Sending message to 8998@/54.183.209.219 > TRACE [SharedPool-Worker-1] 2016-02-13 09:55:22,939 StorageProxy.java:1284 - > Adding FWD message to 9032@/52.53.215.74 > TRACE [SharedPool-Worker-1] 2016-02-13 09:55:22,940 StorageProxy.java:1284 - > Adding FWD message to 9033@/54.183.23.201 > TRACE [SharedPool-Worker-1] 2016-02-13 09:55:22,941 StorageProxy.java:1289 - > Sending message to 9034@/54.183.209.219 > TRACE [SharedPool-Worker-1] 2016-02-13 09:55:28,975 StorageProxy.java:1284 - > Adding FWD message to 9064@/52.53.215.74 > TRACE [SharedPool-Worker-1] 2016-02-13 09:55:28,976 StorageProxy.java:1284 - > Adding FWD message to 9065@/54.183.23.201 > TRACE [SharedPool-Worker-1] 2016-02-13 09:55:28,977 StorageProxy.java:1289 - > Sending message to 9066@/54.183.209.219 > TRACE [SharedPool-Worker-1] 2016-02-13 09:55:33,464 StorageProxy.java:1284 - > Adding FWD message to 9094@/52.53.215.74 > TRACE [SharedPool-Worker-1] 2016-02-13 09:55:33,465 StorageProxy.java:1284 - > Adding FWD message to 9095@/54.183.23.201 > TRACE [SharedPool-Worker-1] 2016-02-13 09:55:33,478 StorageProxy.java:1289 - > Sending message to 9096@/54.183.209.219 > TRACE [SharedPool-Worker-1] 2016-02-13 09:55:39,243 StorageProxy.java:1284 - > Adding FWD message to 9121@/52.53.215.74 > TRACE [SharedPool-Worker-1] 2016-02-13 09:55:39,244 StorageProxy.java:1284 - > Adding FWD message to 9122@/54.183.23.201 > TRACE [SharedPool-Worker-1] 2016-02-13 09:55:39,244 StorageProxy.java:1289 - > Sending message to 9123@/54.183.209.219 > TRACE [SharedPool-Worker-1] 2016-02-13 09:55:44,248 StorageProxy.java:1284 - > Adding FWD message to 9145@/52.53.215.74 > TRACE [SharedPool-Worker-1] 2016-02-13 09:55:44,249 StorageProxy.java:1284 - > Adding FWD message to 9146@/54.183.23.201 > TRACE [SharedPool-Worker-1] 2016-02-13 09:55:44,249 StorageProxy.java:1289 - > Sending message to 9147@/54.183.209.219 > TRACE [SharedPool-Worker-1] 2016-02-13 09:55:49,731 StorageProxy.java:1284 - > Adding FWD message to 9170@/52.53.215.74 > TRACE [SharedPool-Worker-1] 2016-02-13 09:55:49,734 StorageProxy.java:1284 - > Adding FWD message to 9171@/54.183.23.201 > TRACE [SharedPool-Worker-1] 2016-02-13 09:55:49,735 StorageProxy.java:1289 - > Sending message to 9172@/54.183.209.219 > INFO [RMI TCP Connection(22)-54.173.227.52] 2016-02-13 09:56:19,545 > StorageService.java:3353 - set log level to INFO for classes under > 'org.apache.cassandra.service.StorageProxy' (if the level doesn't look like > 'INFO' then the logger couldn't parse 'INFO') > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org