[ 
https://issues.apache.org/jira/browse/HADOOP-9090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Elhemali updated HADOOP-9090:
-------------------------------------

    Attachment: HADOOP-9090.justEnhanceDefaultImpl.3.patch

Thanks Luke. It's a fundamentally hard problem: if we assume that putMetrics() 
from arbitrary sinks can hang, then we'll have to interrupt the thread that 
doing the work and hope that unsticks it (that's how thread pools cancel work 
for example). Having said that, I got uneasy about using stop(), and I agree 
with your comment that I was creating too many threads willy-nilly. The new 
patch thus does two things:
1. Creates a single thread (if needed) for on-demand publishing per sink, and 
reuses that for subsequent publishing. This should cut down considerably on how 
many threads are created.
2. If it sees a sink timeout, it interrupts the worker thread but doesn't 
stop() it (joins it and hangs with it if needed).

I personally think that's the most reliable thing we can do. One alternative as 
you say is to mark the worker thread as a daemon, so we don't have to join it 
and hang with it if it hangs, but just abandon it. The problem there is that 
sinks may be doing things to external systems that will leave them in 
inconsistent states if the process just shuts down while they're still working 
with no warning. Let me know what you think.
                
> Refactor MetricsSystemImpl to allow for an on-demand publish system
> -------------------------------------------------------------------
>
>                 Key: HADOOP-9090
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9090
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: metrics
>            Reporter: Mostafa Elhemali
>            Priority: Minor
>         Attachments: HADOOP-9090.2.patch, 
> HADOOP-9090.justEnhanceDefaultImpl.2.patch, 
> HADOOP-9090.justEnhanceDefaultImpl.3.patch, 
> HADOOP-9090.justEnhanceDefaultImpl.patch, HADOOP-9090.patch
>
>
> We have a need to publish metrics out of some short-living processes, which 
> is not really well-suited to the current metrics system implementation which 
> periodically publishes metrics asynchronously (a behavior that works great 
> for long-living processes). Of course I could write my own metrics system, 
> but it seems like such a waste to rewrite all the awesome code currently in 
> the MetricsSystemImpl and supporting classes.
> The way I'm proposing to solve this is to:
> 1. Refactor the MetricsSystemImpl class into an abstract base 
> MetricsSystemImpl class (common configuration and other code) and a concrete 
> PeriodicPublishMetricsSystemImpl class (timer thread).
> 2. Refactor the MetricsSinkAdapter class into an abstract base 
> MetricsSinkAdapter class (common configuration and other code) and a concrete 
> AsyncMetricsSinkAdapter class (asynchronous publishing using the SinkQueue).
> 3. Derive a new simple class OnDemandPublishMetricsSystemImpl from 
> MetricsSystemImpl, that just exposes a synchronous publish() method to do all 
> the work.
> 4. Derive a SyncMetricsSinkAdapter class from MetricsSinkAdapter to just 
> synchronously push metrics to the underlying sink.
> Does that sound reasonable? I'll attach the patch with all this coded up and 
> simple tests (could use some polish I guess, but wanted to get everyone's 
> opinion first). Notice that this is somewhat of a breaking change since 
> MetricsSystemImpl is public (although it's marked with 
> InterfaceAudience.Private); if the breaking change is a problem I could just 
> rename the refactored classes so that PeriodicPublishMetricsSystemImpl is 
> still called MetricsSystemImpl (and MetricsSystemImpl -> 
> BaseMetricsSystemImpl).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to