Profiling Hadoop Job

Leonardo Urbina Wed, 07 Mar 2012 11:37:01 -0800

Hello everyone,

I have a Hadoop job that I run on several GBs of data that I am trying to
optimize in order to reduce the memory consumption as well as improve the
speed. I am following the steps outlined in Tom White's "Hadoop: The
Definitive Guide" for profiling using HPROF (p161), by setting the
following properties in the JobConf:


        job.setProfileEnabled(true);

job.setProfileParams("-agentlib:hprof=cpu=samples,heap=sites,depth=6," +
                "force=n,thread=y,verbose=n,file=%s");
        job.setProfileTaskRange(true, "0-2");
        job.setProfileTaskRange(false, "0-2");

I am trying to run this locally on a single pseudo-distributed install of
hadoop (0.20.2) and it gives the following error:

Exception in thread "main" java.io.FileNotFoundException:
attempt_201203071311_0004_m_000000_0.profile (Permission denied)
        at java.io.FileOutputStream.open(Native Method)
        at java.io.FileOutputStream.<init>(FileOutputStream.java:194)
        at java.io.FileOutputStream.<init>(FileOutputStream.java:84)
        at
org.apache.hadoop.mapred.JobClient.downloadProfile(JobClient.java:1226)
        at
org.apache.hadoop.mapred.JobClient.monitorAndPrintJob(JobClient.java:1302)
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1251)
        at
com.BitSight.hadoopAggregator.AggregatorDriver.run(AggregatorDriver.java:89)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at
com.BitSight.hadoopAggregator.AggregatorDriver.main(AggregatorDriver.java:94)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

However, I can access these logs directly from the tasktracker's logs
(through the web UI). For the sakes of  running this locally, I could just
ignore this error, however I want to be able to profile the job once
deployed to our hadoop cluster and need to be able to automatically
retrieve these logs. Do I need to change the permissions in HDFS to allow
for this? Any ideas on how to fix this? Thanks in advance,

Best,
-Leo

-- 
Leo Urbina
Massachusetts Institute of Technology
Department of Electrical Engineering and Computer Science
Department of Mathematics
lurb...@mit.edu

Profiling Hadoop Job

Reply via email to