[ 
https://issues.apache.org/jira/browse/HADOOP-9090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13506045#comment-13506045
 ] 

Luke Lu commented on HADOOP-9090:
---------------------------------

I don't think you can control what happens to external systems, which _should_ 
handle arbitrary connection errors etc by unexpected client (even 
router/firewall) shutdown. Another problem of the new patch is that you're 
creating a latch and a semaphore per record (vs per MetricsBuffer) and there 
can be hundreds (up to a few thousands) of records per put. If the sink hangs, 
you'll be recreating new thread/latch/semaphore per record and the user 
perceived timeout would be configured timeout * number of records. Another 
issue is that hanging can happen in sink.flush as well.

Why not do the simple notification in the existing code like the following 
(untested sketch)?:
{code}
boolean oobPut;

// illustration only, should be in the ctor after retry* variables are defined
final long OOB_PUT_TIMEOUT = retryDelay * Math.pow(retryBackoff, retryCount) * 
1000;

synchronized void putMetricsImmediate(MetricsBuffer mb) {
  if (!oobPut) {
    oobPut = true;
    if (queue.enqueue(buffer)) {
      wait(OOB_PUT_TIMEOUT);
      oobPut = false;
    } // otherwise queue is full due to sink issues anyway.
  } else { // another oobPut in progress
    wait(OOB_PUT_TIMEOUT);
    oobPut = false; // just in case
  }
}

// after queue.consumeAll(this); in publishMetricsFromQueue (needs to be 
synchronized now)
if (oobPut) {
  notifyAll();
}
{code}

Now you get all the retry/timeout logic for free :)

                
> Refactor MetricsSystemImpl to allow for an on-demand publish system
> -------------------------------------------------------------------
>
>                 Key: HADOOP-9090
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9090
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: metrics
>            Reporter: Mostafa Elhemali
>            Priority: Minor
>         Attachments: HADOOP-9090.2.patch, 
> HADOOP-9090.justEnhanceDefaultImpl.2.patch, 
> HADOOP-9090.justEnhanceDefaultImpl.3.patch, 
> HADOOP-9090.justEnhanceDefaultImpl.patch, HADOOP-9090.patch
>
>
> We have a need to publish metrics out of some short-living processes, which 
> is not really well-suited to the current metrics system implementation which 
> periodically publishes metrics asynchronously (a behavior that works great 
> for long-living processes). Of course I could write my own metrics system, 
> but it seems like such a waste to rewrite all the awesome code currently in 
> the MetricsSystemImpl and supporting classes.
> The way I'm proposing to solve this is to:
> 1. Refactor the MetricsSystemImpl class into an abstract base 
> MetricsSystemImpl class (common configuration and other code) and a concrete 
> PeriodicPublishMetricsSystemImpl class (timer thread).
> 2. Refactor the MetricsSinkAdapter class into an abstract base 
> MetricsSinkAdapter class (common configuration and other code) and a concrete 
> AsyncMetricsSinkAdapter class (asynchronous publishing using the SinkQueue).
> 3. Derive a new simple class OnDemandPublishMetricsSystemImpl from 
> MetricsSystemImpl, that just exposes a synchronous publish() method to do all 
> the work.
> 4. Derive a SyncMetricsSinkAdapter class from MetricsSinkAdapter to just 
> synchronously push metrics to the underlying sink.
> Does that sound reasonable? I'll attach the patch with all this coded up and 
> simple tests (could use some polish I guess, but wanted to get everyone's 
> opinion first). Notice that this is somewhat of a breaking change since 
> MetricsSystemImpl is public (although it's marked with 
> InterfaceAudience.Private); if the breaking change is a problem I could just 
> rename the refactored classes so that PeriodicPublishMetricsSystemImpl is 
> still called MetricsSystemImpl (and MetricsSystemImpl -> 
> BaseMetricsSystemImpl).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to