[
https://issues.apache.org/jira/browse/HADOOP-9090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13506045#comment-13506045
]
Luke Lu commented on HADOOP-9090:
---------------------------------
I don't think you can control what happens to external systems, which _should_
handle arbitrary connection errors etc by unexpected client (even
router/firewall) shutdown. Another problem of the new patch is that you're
creating a latch and a semaphore per record (vs per MetricsBuffer) and there
can be hundreds (up to a few thousands) of records per put. If the sink hangs,
you'll be recreating new thread/latch/semaphore per record and the user
perceived timeout would be configured timeout * number of records. Another
issue is that hanging can happen in sink.flush as well.
Why not do the simple notification in the existing code like the following
(untested sketch)?:
{code}
boolean oobPut;
// illustration only, should be in the ctor after retry* variables are defined
final long OOB_PUT_TIMEOUT = retryDelay * Math.pow(retryBackoff, retryCount) *
1000;
synchronized void putMetricsImmediate(MetricsBuffer mb) {
if (!oobPut) {
oobPut = true;
if (queue.enqueue(buffer)) {
wait(OOB_PUT_TIMEOUT);
oobPut = false;
} // otherwise queue is full due to sink issues anyway.
} else { // another oobPut in progress
wait(OOB_PUT_TIMEOUT);
oobPut = false; // just in case
}
}
// after queue.consumeAll(this); in publishMetricsFromQueue (needs to be
synchronized now)
if (oobPut) {
notifyAll();
}
{code}
Now you get all the retry/timeout logic for free :)
> Refactor MetricsSystemImpl to allow for an on-demand publish system
> -------------------------------------------------------------------
>
> Key: HADOOP-9090
> URL: https://issues.apache.org/jira/browse/HADOOP-9090
> Project: Hadoop Common
> Issue Type: New Feature
> Components: metrics
> Reporter: Mostafa Elhemali
> Priority: Minor
> Attachments: HADOOP-9090.2.patch,
> HADOOP-9090.justEnhanceDefaultImpl.2.patch,
> HADOOP-9090.justEnhanceDefaultImpl.3.patch,
> HADOOP-9090.justEnhanceDefaultImpl.patch, HADOOP-9090.patch
>
>
> We have a need to publish metrics out of some short-living processes, which
> is not really well-suited to the current metrics system implementation which
> periodically publishes metrics asynchronously (a behavior that works great
> for long-living processes). Of course I could write my own metrics system,
> but it seems like such a waste to rewrite all the awesome code currently in
> the MetricsSystemImpl and supporting classes.
> The way I'm proposing to solve this is to:
> 1. Refactor the MetricsSystemImpl class into an abstract base
> MetricsSystemImpl class (common configuration and other code) and a concrete
> PeriodicPublishMetricsSystemImpl class (timer thread).
> 2. Refactor the MetricsSinkAdapter class into an abstract base
> MetricsSinkAdapter class (common configuration and other code) and a concrete
> AsyncMetricsSinkAdapter class (asynchronous publishing using the SinkQueue).
> 3. Derive a new simple class OnDemandPublishMetricsSystemImpl from
> MetricsSystemImpl, that just exposes a synchronous publish() method to do all
> the work.
> 4. Derive a SyncMetricsSinkAdapter class from MetricsSinkAdapter to just
> synchronously push metrics to the underlying sink.
> Does that sound reasonable? I'll attach the patch with all this coded up and
> simple tests (could use some polish I guess, but wanted to get everyone's
> opinion first). Notice that this is somewhat of a breaking change since
> MetricsSystemImpl is public (although it's marked with
> InterfaceAudience.Private); if the breaking change is a problem I could just
> rename the refactored classes so that PeriodicPublishMetricsSystemImpl is
> still called MetricsSystemImpl (and MetricsSystemImpl ->
> BaseMetricsSystemImpl).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira