Andrew wrote:
I've noticed that task tracker moves all unpacked jars into ${hadoop.tmp.dir}/mapred/local/taskTracker.

We are using a lot of external libraries, that are deployed via "-libjars" option. The total number of files after unpacking is about 20 thousands.

After running a number of jobs, tasks start to be killed with timeout reason ("Task attempt_200901281518_0011_m_000173_2 failed to report status for 601 seconds. Killing!"). All killed tasks are in "initializing" state. I've watched the tasktracker logs and found such messages:


Thread 20926 (Thread-10368):
  State: BLOCKED
  Blocked count: 3611
  Waited count: 24
  Blocked on java.lang.ref.reference$l...@e48ed6
  Blocked by 20882 (Thread-10341)
  Stack:
    java.lang.StringCoding$StringEncoder.encode(StringCoding.java:232)
    java.lang.StringCoding.encode(StringCoding.java:272)
    java.lang.String.getBytes(String.java:947)
    java.io.UnixFileSystem.getBooleanAttributes0(Native Method)
    java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:228)
    java.io.File.isDirectory(File.java:754)
    org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:427)
    org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:433)
    org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:433)
    org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:433)
    org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:433)
    org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:433)
    org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:433)
    org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:433)
    org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:433)
    org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:433)
    org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:433)
    org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:433)
    org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:433)
    org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:433)


This is exactly as in HADOOP-4780. As I understand, patch brings the code, which stores map of directories along with their DU's, thus reducing the number of calls to DU. This must help but the process of deleting 20000 files taks too long. I've manually deleted archive after 10 jobs had run and it took over 30 minutes on XFS. Three times more, that default timeout for tasks!

Is there is the way to prohibit unpacking of jars? Or at least not to hold the archive? Or any other better way to solve this problem?

Hadoop version: 0.19.0.


Now, there is no way to stop DistributedCache from stopping unpacking of jars. I think it should have an option (thru configuration) whether to unpack or not.
Can you raise a jira for the same?

Thanks
Amareshwari

Reply via email to