[ https://issues.apache.org/jira/browse/HADOOP-4780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12653780#action_12653780 ]
he yongqiang commented on HADOOP-4780: -------------------------------------- it seems that FileUtil.getDU(new File(baseDir.toString())) is quite time-consuming, maybe we could just remove the function call of FileUtil.getDU() in DistributedCache.getLocalCache. I don't think it matters if this statement is removed. > Task Tracker burns a lot of cpu in calling getLocalCache > --------------------------------------------------------- > > Key: HADOOP-4780 > URL: https://issues.apache.org/jira/browse/HADOOP-4780 > Project: Hadoop Core > Issue Type: Bug > Components: mapred > Affects Versions: 0.18.2 > Reporter: Runping Qi > > I noticed that many times, a task tracker max up to 6 cpus. > During that time, iostat shows majority of that was system cpu. > That situation can last for quite long. > During that time, I saw a number of threads were in the following state: > java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.getBooleanAttributes0(Native Method) > at > java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:228) > at java.io.File.exists(File.java:733) > at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:399) > at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407) > at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407) > at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407) > at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407) > at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407) > at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407) > at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407) > at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407) > at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407) > at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407) > at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407) > at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407) > at org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:407) > at > org.apache.hadoop.filecache.DistributedCache.getLocalCache(DistributedCache.java:176) > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:140) > I suspect that getLocalCache is too expensive. > And calling it for every task initialization seems too much waste. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.