[ 
https://issues.apache.org/jira/browse/PHOENIX-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15893092#comment-15893092
 ] 

Samarth Jain commented on PHOENIX-3062:
---------------------------------------

Oops, yes. I meant HBASE-16211.

[~jamestaylor] - I think this might be an actual issue. In 
org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl we 
schedule the JMXCacheBuster to clear out the JMXCache every 5 mins. 

{code}
 // Every few mins clean the JMX cache.
    executor.getExecutor().scheduleWithFixedDelay(new Runnable() {
      public void run() {
        JmxCacheBuster.clearJmxCache();
      }
    }, 5, 5, TimeUnit.MINUTES);
{code}

Before HBASE-16211, the JMXCacheBuster.clearJMXCache() would simply restart 
(!!) the entire metrics system. 

{code}
try {
        if (DefaultMetricsSystem.instance() != null) {
          DefaultMetricsSystem.instance().stop();
          // Sleep some time so that the rest of the hadoop metrics
          // system knows that things are done
          Thread.sleep(500);
          DefaultMetricsSystem.instance().start();
        }
      }  catch (Exception exception)  {
        LOG.debug("error clearing the jmx it appears the metrics system hasn't 
been started",
            exception);
      }
{code}

Stopping the metrics system internally stops all the sinks and clears out list 
in which it maintains references of those sinks.

{code}
private synchronized void stopSinks() {
    for (Entry<String, MetricsSinkAdapter> entry : sinks.entrySet()) {
      MetricsSinkAdapter sa = entry.getValue();
      LOG.debug("Stopping metrics sink "+ entry.getKey() +
          ": class=" + sa.sink().getClass());
      sa.stop();
    }
    sinks.clear();
  }
{code}

Which means the start() method in the MetricsSystem doesn't know which sinks it 
should be re-registering. So even if PhoenixMetricsSink was registered, after 
no later than 5 mins, it would be removed by the JMXCacheBuster via 
MetricsRegionAggregateSourceImpl making tracing unusable. 

I am not too sure how classes like MetricsRegionAggregateSourceImpl, 
MetricsReplicationSourceSourceImpl are used. I am guessing they have to do with 
publishing various internal hbase metrics via JMX. 

Probably [~elserj] or [~enis] would know? 


> JMXCacheBuster restarting the metrics system causes PhoenixTracingEndToEndIT 
> to hang
> ------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-3062
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3062
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>             Fix For: 4.10.0
>
>         Attachments: phoenix-3062_v1.patch
>
>
> With some recent fixes in the hbase metrics system, we are now affectively 
> restarting the metrics system (in HBase-1.3.0, probably not affecting 1.2.0). 
> Since we use a custom sink in the PhoenixTracingEndToEndIT, restarting the 
> metrics system loses the registered sink thus causing a hang. 
> We need a fix in HBase, and Phoenix so that we will not restart the metrics 
> during tests. 
> Thanks to [~sergey.soldatov] for analyzing the initial root cause of the 
> hang. 
> See HBASE-14166 and others. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to