[
https://issues.apache.org/jira/browse/HADOOP-9090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mostafa Elhemali updated HADOOP-9090:
-------------------------------------
Attachment: HADOOP-9090.justEnhanceDefaultImpl.3.patch
Thanks Luke. It's a fundamentally hard problem: if we assume that putMetrics()
from arbitrary sinks can hang, then we'll have to interrupt the thread that
doing the work and hope that unsticks it (that's how thread pools cancel work
for example). Having said that, I got uneasy about using stop(), and I agree
with your comment that I was creating too many threads willy-nilly. The new
patch thus does two things:
1. Creates a single thread (if needed) for on-demand publishing per sink, and
reuses that for subsequent publishing. This should cut down considerably on how
many threads are created.
2. If it sees a sink timeout, it interrupts the worker thread but doesn't
stop() it (joins it and hangs with it if needed).
I personally think that's the most reliable thing we can do. One alternative as
you say is to mark the worker thread as a daemon, so we don't have to join it
and hang with it if it hangs, but just abandon it. The problem there is that
sinks may be doing things to external systems that will leave them in
inconsistent states if the process just shuts down while they're still working
with no warning. Let me know what you think.
> Refactor MetricsSystemImpl to allow for an on-demand publish system
> -------------------------------------------------------------------
>
> Key: HADOOP-9090
> URL: https://issues.apache.org/jira/browse/HADOOP-9090
> Project: Hadoop Common
> Issue Type: New Feature
> Components: metrics
> Reporter: Mostafa Elhemali
> Priority: Minor
> Attachments: HADOOP-9090.2.patch,
> HADOOP-9090.justEnhanceDefaultImpl.2.patch,
> HADOOP-9090.justEnhanceDefaultImpl.3.patch,
> HADOOP-9090.justEnhanceDefaultImpl.patch, HADOOP-9090.patch
>
>
> We have a need to publish metrics out of some short-living processes, which
> is not really well-suited to the current metrics system implementation which
> periodically publishes metrics asynchronously (a behavior that works great
> for long-living processes). Of course I could write my own metrics system,
> but it seems like such a waste to rewrite all the awesome code currently in
> the MetricsSystemImpl and supporting classes.
> The way I'm proposing to solve this is to:
> 1. Refactor the MetricsSystemImpl class into an abstract base
> MetricsSystemImpl class (common configuration and other code) and a concrete
> PeriodicPublishMetricsSystemImpl class (timer thread).
> 2. Refactor the MetricsSinkAdapter class into an abstract base
> MetricsSinkAdapter class (common configuration and other code) and a concrete
> AsyncMetricsSinkAdapter class (asynchronous publishing using the SinkQueue).
> 3. Derive a new simple class OnDemandPublishMetricsSystemImpl from
> MetricsSystemImpl, that just exposes a synchronous publish() method to do all
> the work.
> 4. Derive a SyncMetricsSinkAdapter class from MetricsSinkAdapter to just
> synchronously push metrics to the underlying sink.
> Does that sound reasonable? I'll attach the patch with all this coded up and
> simple tests (could use some polish I guess, but wanted to get everyone's
> opinion first). Notice that this is somewhat of a breaking change since
> MetricsSystemImpl is public (although it's marked with
> InterfaceAudience.Private); if the breaking change is a problem I could just
> rename the refactored classes so that PeriodicPublishMetricsSystemImpl is
> still called MetricsSystemImpl (and MetricsSystemImpl ->
> BaseMetricsSystemImpl).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira