[ 
https://issues.apache.org/jira/browse/HADOOP-5469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Zeyliger updated HADOOP-5469:
------------------------------------

    Status: Patch Available  (was: Open)

> Exposing Hadoop metrics via HTTP
> --------------------------------
>
>                 Key: HADOOP-5469
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5469
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: metrics
>            Reporter: Philip Zeyliger
>         Attachments: HADOOP-5469.patch
>
>
> I'd like to be able to query Hadoop's metrics via HTTP, e.g., by going to 
> "/metrics" on any Hadoop daemon that has an HttpServer.  My motivation is 
> pretty simple--if you're running on a lot of machines, tracking down the 
> relevant metrics files is pretty time-consuming; this would be a useful 
> debugging utility.  I'd also like the output to be parseable, so I could 
> write a quick web app to query the metrics dynamically.
> This is similar in spirit, but different, from just using JMX.  (See also 
> HADOOP-4756.)  JMX requires a client, and, more annoyingly, JMX requires 
> setting up authentication.  If you just disable authentication, someone can 
> do Bad Things, and if you enable it, you have to worry about yet another 
> password. It's also more complete--JMX require separate instrumentation, so, 
> for example, the JobTracker's metrics aren't exposed via JMX.
> To start the discussion going, I've attached a patch.  I had to add a method 
> to ContextFactory to get all the active MetrixContexts, implement a do-little 
> MetricsContext that simply inherits from AbstractMetricsContext, add a method 
> to MetricsContext to get all the records, expose copy methods for the maps in 
> OutputRecord, and implemented an easy servlet.  I ended up removing some
> common code from all MetricsContexts, for setting the period; I'm open to 
> taking that out if it muddies the patch significantly.
> I'd love to hear your suggestions.  There's a bug in the JSON representation, 
> and there's some gross type-handling.
> The patch is missing tests.  I wanted to post to gather feedback before I got 
> too far, but tests are forthcoming.
> Here's a sample output for a job tracker, while it was running a "pi" job:
> {noformat}
> jvm
>   metrics
>     {hostName=doorstop.local, processName=JobTracker, sessionId=}
>       gcCount=22
>       gcTimeMillis=68
>       logError=0
>       logFatal=0
>       logInfo=52
>       logWarn=0
>       memHeapCommittedM=7.4375
>       memHeapUsedM=4.2150116
>       memNonHeapCommittedM=23.1875
>       memNonHeapUsedM=18.438614
>       threadsBlocked=0
>       threadsNew=0
>       threadsRunnable=7
>       threadsTerminated=0
>       threadsTimedWaiting=8
>       threadsWaiting=15
> mapred
>   job
>     {counter=Map input records, group=Map-Reduce Framework, 
> hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, 
> sessionId=, user=philip}
>       value=2.0
>     {counter=Map output records, group=Map-Reduce Framework, 
> hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, 
> sessionId=, user=philip}
>       value=4.0
>     {counter=Data-local map tasks, group=Job Counters , 
> hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, 
> sessionId=, user=philip}
>       value=4.0
>     {counter=Map input bytes, group=Map-Reduce Framework, 
> hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, 
> sessionId=, user=philip}
>       value=48.0
>     {counter=FILE_BYTES_WRITTEN, group=FileSystemCounters, 
> hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, 
> sessionId=, user=philip}
>       value=148.0
>     {counter=Combine output records, group=Map-Reduce Framework, 
> hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, 
> sessionId=, user=philip}
>       value=0.0
>     {counter=Launched map tasks, group=Job Counters , 
> hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, 
> sessionId=, user=philip}
>       value=4.0
>     {counter=HDFS_BYTES_READ, group=FileSystemCounters, 
> hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, 
> sessionId=, user=philip}
>       value=236.0
>     {counter=Map output bytes, group=Map-Reduce Framework, 
> hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, 
> sessionId=, user=philip}
>       value=64.0
>     {counter=Launched reduce tasks, group=Job Counters , 
> hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, 
> sessionId=, user=philip}
>       value=1.0
>     {counter=Spilled Records, group=Map-Reduce Framework, 
> hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, 
> sessionId=, user=philip}
>       value=4.0
>     {counter=Combine input records, group=Map-Reduce Framework, 
> hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr, 
> sessionId=, user=philip}
>       value=0.0
>   jobtracker
>     {hostName=doorstop.local, sessionId=}
>       jobs_completed=0
>       jobs_submitted=1
>       maps_completed=2
>       maps_launched=5
>       reduces_completed=0
>       reduces_launched=1
> rpc
>   metrics
>     {hostName=doorstop.local, port=50030}
>       NumOpenConnections=2
>       RpcProcessingTime_avg_time=0
>       RpcProcessingTime_num_ops=84
>       RpcQueueTime_avg_time=1
>       RpcQueueTime_num_ops=84
>       callQueueLen=0
>       getBuildVersion_avg_time=0
>       getBuildVersion_num_ops=1
>       getJobProfile_avg_time=0
>       getJobProfile_num_ops=17
>       getJobStatus_avg_time=0
>       getJobStatus_num_ops=32
>       getNewJobId_avg_time=0
>       getNewJobId_num_ops=1
>       getProtocolVersion_avg_time=0
>       getProtocolVersion_num_ops=2
>       getSystemDir_avg_time=0
>       getSystemDir_num_ops=2
>       getTaskCompletionEvents_avg_time=0
>       getTaskCompletionEvents_num_ops=19
>       heartbeat_avg_time=5
>       heartbeat_num_ops=9
>       submitJob_avg_time=0
>       submitJob_num_ops=1
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to