[
https://issues.apache.org/jira/browse/PHOENIX-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15893092#comment-15893092
]
Samarth Jain commented on PHOENIX-3062:
---------------------------------------
Oops, yes. I meant HBASE-16211.
[~jamestaylor] - I think this might be an actual issue. In
org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl we
schedule the JMXCacheBuster to clear out the JMXCache every 5 mins.
{code}
// Every few mins clean the JMX cache.
executor.getExecutor().scheduleWithFixedDelay(new Runnable() {
public void run() {
JmxCacheBuster.clearJmxCache();
}
}, 5, 5, TimeUnit.MINUTES);
{code}
Before HBASE-16211, the JMXCacheBuster.clearJMXCache() would simply restart
(!!) the entire metrics system.
{code}
try {
if (DefaultMetricsSystem.instance() != null) {
DefaultMetricsSystem.instance().stop();
// Sleep some time so that the rest of the hadoop metrics
// system knows that things are done
Thread.sleep(500);
DefaultMetricsSystem.instance().start();
}
} catch (Exception exception) {
LOG.debug("error clearing the jmx it appears the metrics system hasn't
been started",
exception);
}
{code}
Stopping the metrics system internally stops all the sinks and clears out list
in which it maintains references of those sinks.
{code}
private synchronized void stopSinks() {
for (Entry<String, MetricsSinkAdapter> entry : sinks.entrySet()) {
MetricsSinkAdapter sa = entry.getValue();
LOG.debug("Stopping metrics sink "+ entry.getKey() +
": class=" + sa.sink().getClass());
sa.stop();
}
sinks.clear();
}
{code}
Which means the start() method in the MetricsSystem doesn't know which sinks it
should be re-registering. So even if PhoenixMetricsSink was registered, after
no later than 5 mins, it would be removed by the JMXCacheBuster via
MetricsRegionAggregateSourceImpl making tracing unusable.
I am not too sure how classes like MetricsRegionAggregateSourceImpl,
MetricsReplicationSourceSourceImpl are used. I am guessing they have to do with
publishing various internal hbase metrics via JMX.
Probably [~elserj] or [~enis] would know?
> JMXCacheBuster restarting the metrics system causes PhoenixTracingEndToEndIT
> to hang
> ------------------------------------------------------------------------------------
>
> Key: PHOENIX-3062
> URL: https://issues.apache.org/jira/browse/PHOENIX-3062
> Project: Phoenix
> Issue Type: Bug
> Reporter: Enis Soztutar
> Assignee: Enis Soztutar
> Fix For: 4.10.0
>
> Attachments: phoenix-3062_v1.patch
>
>
> With some recent fixes in the hbase metrics system, we are now affectively
> restarting the metrics system (in HBase-1.3.0, probably not affecting 1.2.0).
> Since we use a custom sink in the PhoenixTracingEndToEndIT, restarting the
> metrics system loses the registered sink thus causing a hang.
> We need a fix in HBase, and Phoenix so that we will not restart the metrics
> during tests.
> Thanks to [~sergey.soldatov] for analyzing the initial root cause of the
> hang.
> See HBASE-14166 and others.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)