[
https://issues.apache.org/jira/browse/CASSANDRA-11740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15362569#comment-15362569
]
Joel Knighton commented on CASSANDRA-11740:
-------------------------------------------
I don't have any great ideas here other than Jeremiah's suggestion above. When
using GPFS, there's a hierarchy of lookup that will happen.
First, we look for the information in gossip.
Then, if we have a fallback PropertyFileSnitch, we will use that.
If we don't, we'll first look in the system keyspace and then return defaults.
The default for GPFS is UNKNOWN_RACK/UNKNOWN_DC.
I have no ideas how these values could get in gossip or the system keyspace of
the node without having this configured in a file.
Since DC1/r1 are the default options given in the sample
cassandra-topology.properties distributed with Cassandra, it seems likely that
this config file has not been removed from all nodes.
That said, if the information isn't present in gossip, there likely is
something else that's a problem. This could be better debugged with debug/trace
level logs for some node A with bad nodetool status output for node B as well
as the debug/trace level logs for node B.
> Nodes have wrong membership view of the cluster
> -----------------------------------------------
>
> Key: CASSANDRA-11740
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11740
> Project: Cassandra
> Issue Type: Bug
> Reporter: Dikang Gu
> Assignee: Joel Knighton
> Fix For: 2.2.x, 3.x
>
>
> We have a few hundreds nodes across 3 data centers, and we are doing a few
> millions writes per second into the cluster.
> The problem we found is that there are some nodes (>10) have very wrong view
> of the cluster.
> For example, we have 3 data centers A, B and C. On the problem nodes, in the
> output of the 'nodetool status', it shows that ~100 nodes are not in data
> center A, B, or C. Instead, it shows nodes are in DC1, and rack r1, which is
> very wrong. And as a result, the node will return wrong results to client
> requests.
> {code}
> Datacenter: DC1
> ===============
> Status=Up/Down
> / State=Normal/Leaving/Joining/Moving
> – Address Load Tokens Owns Host ID Rack
> UN 2401:db00:11:6134:face:0:1:0 509.52 GB 256 ?
> e24656ac-c3b2-4117-b933-a5b06852c993 r1
> UN 2401:db00:11:b218:face:0:5:0 510.01 GB 256 ?
> 53da2104-b1b5-4fa5-a3dd-52c7557149f9 r1
> UN 2401:db00:2130:5133:face:0:4d:0 459.75 GB 256 ?
> ef8311f0-f6b8-491c-904d-baa925cdd7c2 r1
> {code}
> We are using GossipingPropertyFileSnitch.
> Thanks
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)