[ 
https://issues.apache.org/jira/browse/CASSANDRA-19902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17902115#comment-17902115
 ] 

Paulo Motta commented on CASSANDRA-19902:
-----------------------------------------

It turns out it was trickier to revert CASSANDRA-11537 than to just fix this 
regression, because there was no test to verify the original behavior and this 
somewhat overlaps in trunk with CASSANDRA-18330 and CASSANDRA-19384.

To give a little context on CASSANDRA-11537, StorageServiceMBean was being 
published before StorageService had finished initializing. This caused weird 
errors (ie. AssertionError) when nodetool was queried before StorageService 
initialization was complete.

The fix of CASSANDRA-11537 was to only publish the JMX interface after 
StorageService has finished initializing. An unintended side-effect was that 
StorageServiceMBean is no longer available during bootstrap on 5.0, preventing 
commands like "nodetool netstats"/"nodetool status"/"nodetool gossipinfo" on 
the bootstrapping node.

I created a JVM dtest 
[here|https://github.com/apache/cassandra/pull/3717/commits/b03d7df8fa9405b8b2c2bc7b1cbf2d877d77a2ad]
 based on {{{}GossipTest.testPreventStoppingGossipDuringBootstrap{}}}. This 
test passes on 4.1 while fails on 5.0, indicating there was an uncaught change 
of behavior introduced by CASSANDRA-11537.

The fix on 5.0 is straightforward: publish StorageServiceMbean on 
StorageService.initServer before bootstrap starts and not only after it 
finishes (done 
[here|https://github.com/apache/cassandra/pull/3717/commits/bb45635ff4abaa9b5d6d2984b0f888323200fe19]).

Interestingly on trunk the StorageServiceMbean is being published during 
bootstrap as expected as this was fixed by CASSANDRA-19384, but [this test 
assertion|https://github.com/apache/cassandra/blob/3f9176318b094c10f996ab7653ce5659ea69018f/test/distributed/org/apache/cassandra/distributed/test/ring/BootstrapTest.java#L328]
 fails - indicating the node state is being incorrectly reported as "STARTING" 
instead of "JOINING" during bootstrap.

I think reason for this is because StorageService.getOperationMode was just 
returning the actual CMS state [if StorageService.initialized flag is 
set|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/StorageService.java#L3818],
 which is not the case while the node is bootstrapping. I think the fix to this 
is to return the CMS state if the log had finished replaying, [done 
here|https://github.com/apache/cassandra/pull/3716/commits/9613e3854ecd2665b091aa0a574fe833fa4ee994].
 Let me know if this makes sense [~marcuse].

I suspect the root cause for both CASSANDRA-11537 and CASSANDRA-19384 is 
because StorageServiceMbean was [being published in StorageService 
construtor|https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/service/StorageService.java#L490]
 before these patches, causing it to be published in the first access to 
StorageService.instance before the basic server scaffolding was initialized by 
{{{}CassandraDaemon.setup{}}}.

I think the correct fix to the issue above is to publish StorageServiceMbean 
during StorageService.initServer since at that point everything needed to serve 
JMX commands should have been initialized and what is not should be guarded by 
StorageService.initialized flag. I think this allow us to simplify and remove 
all external uses of StorageService.registerMBeans safely as done 
[here|https://github.com/apache/cassandra/pull/3716/commits/2e9004f7fa100bedfcccbff352c8a47a0435642e].

In addition to this I also included these two cosmetic fixes to do some cleanup:
1) Extract StorageService mbean name to constant 
([commit|https://github.com/apache/cassandra/pull/3716/commits/d94a126cb09ccf8b8869c8e13a0b35dc96ca41ce])
2) Remove StorageService.jmxObjectName since I didn't find a reason for this 
variable, seems to have been introduced by CASSANDRA-4767 
([commit|https://github.com/apache/cassandra/pull/3716/commits/3f9176318b094c10f996ab7653ce5659ea69018f])

I submitted preliminary CI, looks like unrelated failures - will try to get a 
cleaner run:
 * [5.0 PR|https://github.com/apache/cassandra/pull/3717] 
[5.0-CI|https://ci-cassandra.apache.org/job/Cassandra-devbranch-5/88/testReport/]
 * [trunk PR|https://github.com/apache/cassandra/pull/3716] 
[trunk-CI|https://ci-cassandra.apache.org/job/Cassandra-devbranch-5/87/]

Let me know if you can review this [~marcuse] since this overlaps with CMS and 
is related to CASSANDRA-19384.

> StorageService JMX mbean is not available during bootstrap
> ----------------------------------------------------------
>
>                 Key: CASSANDRA-19902
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19902
>             Project: Apache Cassandra
>          Issue Type: Bug
>          Components: Tool/nodetool
>            Reporter: Paulo Motta
>            Assignee: Paulo Motta
>            Priority: Normal
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Looks like the seemingly harmless cosmetic patch from CASSANDRA-11537 causes 
> the StorageServiceMBean to not be available during bootstrap. This causes 
> commands like "nodetool nestats/status/etc" to not be available on the 
> boostrapping node with the following error:
> {code:none}
> - StackTrace --
> javax.management.InstanceNotFoundException: 
> org.apache.cassandra.db:type=StorageService
>         at 
> java.management/com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1083)
>         at 
> java.management/com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:637)
> {code}
> This ticket is just to revert CASSANDRA-11537, we can re-add the improvement 
> of that ticket later.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to