[
https://issues.apache.org/jira/browse/CASSANDRA-19902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17902115#comment-17902115
]
Paulo Motta commented on CASSANDRA-19902:
-----------------------------------------
It turns out it was trickier to revert CASSANDRA-11537 than to just fix this
regression, because there was no test to verify the original behavior and this
somewhat overlaps in trunk with CASSANDRA-18330 and CASSANDRA-19384.
To give a little context on CASSANDRA-11537, StorageServiceMBean was being
published before StorageService had finished initializing. This caused weird
errors (ie. AssertionError) when nodetool was queried before StorageService
initialization was complete.
The fix of CASSANDRA-11537 was to only publish the JMX interface after
StorageService has finished initializing. An unintended side-effect was that
StorageServiceMBean is no longer available during bootstrap on 5.0, preventing
commands like "nodetool netstats"/"nodetool status"/"nodetool gossipinfo" on
the bootstrapping node.
I created a JVM dtest
[here|https://github.com/apache/cassandra/pull/3717/commits/b03d7df8fa9405b8b2c2bc7b1cbf2d877d77a2ad]
based on {{{}GossipTest.testPreventStoppingGossipDuringBootstrap{}}}. This
test passes on 4.1 while fails on 5.0, indicating there was an uncaught change
of behavior introduced by CASSANDRA-11537.
The fix on 5.0 is straightforward: publish StorageServiceMbean on
StorageService.initServer before bootstrap starts and not only after it
finishes (done
[here|https://github.com/apache/cassandra/pull/3717/commits/bb45635ff4abaa9b5d6d2984b0f888323200fe19]).
Interestingly on trunk the StorageServiceMbean is being published during
bootstrap as expected as this was fixed by CASSANDRA-19384, but [this test
assertion|https://github.com/apache/cassandra/blob/3f9176318b094c10f996ab7653ce5659ea69018f/test/distributed/org/apache/cassandra/distributed/test/ring/BootstrapTest.java#L328]
fails - indicating the node state is being incorrectly reported as "STARTING"
instead of "JOINING" during bootstrap.
I think reason for this is because StorageService.getOperationMode was just
returning the actual CMS state [if StorageService.initialized flag is
set|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/StorageService.java#L3818],
which is not the case while the node is bootstrapping. I think the fix to this
is to return the CMS state if the log had finished replaying, [done
here|https://github.com/apache/cassandra/pull/3716/commits/9613e3854ecd2665b091aa0a574fe833fa4ee994].
Let me know if this makes sense [~marcuse].
I suspect the root cause for both CASSANDRA-11537 and CASSANDRA-19384 is
because StorageServiceMbean was [being published in StorageService
construtor|https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/service/StorageService.java#L490]
before these patches, causing it to be published in the first access to
StorageService.instance before the basic server scaffolding was initialized by
{{{}CassandraDaemon.setup{}}}.
I think the correct fix to the issue above is to publish StorageServiceMbean
during StorageService.initServer since at that point everything needed to serve
JMX commands should have been initialized and what is not should be guarded by
StorageService.initialized flag. I think this allow us to simplify and remove
all external uses of StorageService.registerMBeans safely as done
[here|https://github.com/apache/cassandra/pull/3716/commits/2e9004f7fa100bedfcccbff352c8a47a0435642e].
In addition to this I also included these two cosmetic fixes to do some cleanup:
1) Extract StorageService mbean name to constant
([commit|https://github.com/apache/cassandra/pull/3716/commits/d94a126cb09ccf8b8869c8e13a0b35dc96ca41ce])
2) Remove StorageService.jmxObjectName since I didn't find a reason for this
variable, seems to have been introduced by CASSANDRA-4767
([commit|https://github.com/apache/cassandra/pull/3716/commits/3f9176318b094c10f996ab7653ce5659ea69018f])
I submitted preliminary CI, looks like unrelated failures - will try to get a
cleaner run:
* [5.0 PR|https://github.com/apache/cassandra/pull/3717]
[5.0-CI|https://ci-cassandra.apache.org/job/Cassandra-devbranch-5/88/testReport/]
* [trunk PR|https://github.com/apache/cassandra/pull/3716]
[trunk-CI|https://ci-cassandra.apache.org/job/Cassandra-devbranch-5/87/]
Let me know if you can review this [~marcuse] since this overlaps with CMS and
is related to CASSANDRA-19384.
> StorageService JMX mbean is not available during bootstrap
> ----------------------------------------------------------
>
> Key: CASSANDRA-19902
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19902
> Project: Apache Cassandra
> Issue Type: Bug
> Components: Tool/nodetool
> Reporter: Paulo Motta
> Assignee: Paulo Motta
> Priority: Normal
> Time Spent: 20m
> Remaining Estimate: 0h
>
> Looks like the seemingly harmless cosmetic patch from CASSANDRA-11537 causes
> the StorageServiceMBean to not be available during bootstrap. This causes
> commands like "nodetool nestats/status/etc" to not be available on the
> boostrapping node with the following error:
> {code:none}
> - StackTrace --
> javax.management.InstanceNotFoundException:
> org.apache.cassandra.db:type=StorageService
> at
> java.management/com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1083)
> at
> java.management/com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:637)
> {code}
> This ticket is just to revert CASSANDRA-11537, we can re-add the improvement
> of that ticket later.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]