[
https://issues.apache.org/jira/browse/CASSANDRA-19361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sam Tunnicliffe updated CASSANDRA-19361:
----------------------------------------
Resolution: Cannot Reproduce
Status: Resolved (was: Triage Needed)
> fix node info NPE when ClusterMetadata is null
> ----------------------------------------------
>
> Key: CASSANDRA-19361
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19361
> Project: Cassandra
> Issue Type: Bug
> Components: Tool/nodetool, Transactional Cluster Metadata
> Reporter: Ling Mao
> Assignee: Ling Mao
> Priority: Normal
> Fix For: 5.0.x
>
> Attachments: CASSANDRA-19361-stack-error.txt
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> h3. How
>
> I create an ensemble with 3 nodes(It works well), then I add the fourth node
> to join the party.
> when executing nodetool info, get the following exception:
> {code:java}
> ➜ bin ./nodetool info
> java.lang.NullPointerException at
> org.apache.cassandra.service.StorageService.operationMode(StorageService.java:3744)
> at
> org.apache.cassandra.service.StorageService.isBootstrapFailed(StorageService.java:3810)
> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method) at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base/java.lang.reflect.Method.invoke(Method.java:566) at
> sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71)
> ➜ bin ./nodetool info
> WARN [InternalResponseStage:152] 2024-02-02 11:45:15,731
> RemoteProcessor.java:213 - Got error from /127.0.0.4:7000: TIMEOUT when
> sending TCM_COMMIT_REQ, retrying on
> CandidateIterator{candidates=[/127.0.0.4:7000], checkLive=true} error: null
> -- StackTrace -- java.lang.NullPointerException at
> org.apache.cassandra.service.StorageService.getLocalHostId(StorageService.java:1904)
> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method) at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base/java.lang.reflect.Method.invoke(Method.java:566) at
> sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71) at
> jdk.internal.reflect.GeneratedMethodAccessor1.invoke(Unknown Source) at
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base/java.lang.reflect.Method.invoke(Method.java:566) at
> java.base/sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:260){code}
> server 1 cannot execute node info and cql shell, server 2 and 3 can do it.
> Try to query the system prefix tables, I attach stack error log for the
> further debugging. Cannot find a way to recover. After deleting data(losing
> all data), restart and everything became OK
> {code:java}
> ➜ bin ./nodetool status
> Datacenter: datacenter1
> =======================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> -- Address Load Tokens Owns (effective) Host ID
> Rack
> UN 127.0.0.2 ? 16 51.2%
> 6d194555-f6eb-41d0-c000-000000000002 rack1
> DN 127.0.0.4 ? 16 48.8%
> 6d194555-f6eb-41d0-c000-000000000001 rack1{code}
> h3. When
>
> It was introduced by the Patch: CEP-21. Anyway, the NPE check is needed to
> protect its propagation anywhere
> {code:java}
> Implementation of Transactional Cluster Metadata as described in CEP-21
> Hash: ae084237
>
> code diff:
>
> public String getLocalHostId()
> {
> - UUID id = getLocalHostUUID();
> - return id != null ? id.toString() : null;
> + return getLocalHostUUID().toString();
> }
>
> public UUID getLocalHostUUID()
> {
> - UUID id =
> getTokenMetadata().getHostId(FBUtilities.getBroadcastAddressAndPort());
> - if (id != null)
> - return id;
> - // this condition is to prevent accessing the tables when the node
> is not started yet, and in particular,
> - // when it is not going to be started at all (e.g. when running some
> unit tests or client tools).
> - else if ((DatabaseDescriptor.isDaemonInitialized() ||
> DatabaseDescriptor.isToolInitialized()) && CommitLog.instance.isStarted())
> - return SystemKeyspace.getLocalHostId();
> -
> - return null;
> + // Metadata collector requires using local host id, and flush of
> IndexInfo may race with
> + // creation and initialization of cluster metadata service. Metadata
> collector does accept
> + // null localhost ID values, it's just that TokenMetadata was
> created earlier.
> + ClusterMetadata metadata = ClusterMetadata.currentNullable();
> + if (metadata == null ||
> metadata.directory.peerId(getBroadcastAddressAndPort()) == null)
> + return null;
> + return
> metadata.directory.peerId(getBroadcastAddressAndPort()).toUUID();
> } {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]