[
https://issues.apache.org/jira/browse/CASSANDRA-9526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ron Kuris updated CASSANDRA-9526:
---------------------------------
Attachment: PHI-Log-Debug-When-Close.patch.txt
PHI-Race-Condition.patch.txt
Monitor-Phi-JMX.patch.txt
There are three patches here. The main fix is in Monitor-Phi-JMX.patch. This
fully resolves the reported issue.
While inspecting this code, I noticed a small unlikely race condition. If two
phi values come in at the same time as the first one for a host, one could be
lost due to the way the values are being added to the Hashtable. The second
patch resolves that window, by switching to a ConcurrentHashMap and using
putIfAbsent to atomically check for a prior value.
I doubt this could actually happen in the wild but it's still good defensive
coding. Also, it removes Hashtable which is always synchronized.
The third patch will start generating debug log messages when PHI starts
getting close. It's a great way to see that phi_convict_threshold might be too
low. It's not WARN or even INFO because this could generate a lot of logs, but
arguably it could be. If someone has trouble with nodes going offline, they can
turn up the debugging levels and see that phi_convict_threshold is the culprit.
There is also some other code cleanup in the Phi-Log-Debug-When-Close patch.
> Provide a JMX hook to monitor phi values in the FailureDetector
> ---------------------------------------------------------------
>
> Key: CASSANDRA-9526
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9526
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Reporter: Ron Kuris
> Fix For: 2.0.x
>
> Attachments: Monitor-Phi-JMX.patch.txt,
> PHI-Log-Debug-When-Close.patch.txt, PHI-Race-Condition.patch.txt
>
>
> phi_convict_threshold can be tuned, but there's currently no way to monitor
> the phi values to see if you're getting close.
> The attached patch adds the ability to get these values via JMX.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)