Re: Task tracker archive contains too many files

Amareshwari Sriramadasu Wed, 04 Feb 2009 03:37:51 -0800

Andrew wrote:

I've noticed that task tracker moves all unpacked jars into${hadoop.tmp.dir}/mapred/local/taskTracker.
We are using a lot of external libraries, that are deployed via "-libjars"option. The total number of files after unpacking is about 20 thousands.
After running a number of jobs, tasks start to be killed with timeout reason("Task attempt_200901281518_0011_m_000173_2 failed to report status for 601seconds. Killing!"). All killed tasks are in "initializing" state. I'vewatched the tasktracker logs and found such messages:
Thread 20926 (Thread-10368):
  State: BLOCKED
  Blocked count: 3611
  Waited count: 24
  Blocked on java.lang.ref.reference$l...@e48ed6
  Blocked by 20882 (Thread-10341)
  Stack:
    java.lang.StringCoding$StringEncoder.encode(StringCoding.java:232)
    java.lang.StringCoding.encode(StringCoding.java:272)
    java.lang.String.getBytes(String.java:947)
    java.io.UnixFileSystem.getBooleanAttributes0(Native Method)
    java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:228)
    java.io.File.isDirectory(File.java:754)
    org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:427)
    org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:433)
    org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:433)
    org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:433)
    org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:433)
    org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:433)
    org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:433)
    org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:433)
    org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:433)
    org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:433)
    org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:433)
    org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:433)
    org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:433)
    org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:433)
This is exactly as in HADOOP-4780.As I understand, patch brings the code, which stores map of directories alongwith their DU's, thus reducing the number of calls to DU. This must help butthe process of deleting 20000 files taks too long. I've manually deletedarchive after 10 jobs had run and it took over 30 minutes on XFS. Three timesmore, that default timeout for tasks!
Is there is the way to prohibit unpacking of jars? Or at least not to hold thearchive? Or any other better way to solve this problem?
Hadoop version: 0.19.0.

Now, there is no way to stop DistributedCache from stopping unpacking ofjars. I think it should have an option (thru configuration) whether tounpack or not.

Can you raise a jira for the same?

Thanks
Amareshwari

Re: Task tracker archive contains too many files

Reply via email to