I changed dfs.datanode.max.xcievers to 100000 and mapred.job.reuse.jvm.num.tasks to 10. Still the job fails by throwing the same set of errors as earlier.
Sumanth On Thu, Feb 16, 2012 at 7:41 PM, Srinivas Surasani <[email protected]> wrote: > Sumanth, > > For quick check, try setting this to much bigger value( 1M ), though > this is not good practice( data-node may run into out of memory). > > On Thu, Feb 16, 2012 at 10:21 PM, Sumanth V <[email protected]> wrote: > > Hi Srinivas, > > > > The *dfs.datanode.max.xcievers* value is set to 4096 in hdfs-site.xml. > > > > > > Sumanth > > > > > > > > On Thu, Feb 16, 2012 at 7:11 PM, Srinivas Surasani <[email protected]> > wrote: > > > >> Sumanth, I think Sreedhar is pointing to "dfs.datanode.max.xceivers" > >> property in hdfs-site.xml. Try setting this property to higher value. > >> > >> > >> > >> On Thu, Feb 16, 2012 at 9:51 PM, Sumanth V <[email protected]> > wrote: > >> > ulimit values are set to much higher values than the default values > >> > Here is the /etc/security/limits.conf contents - > >> > * - nofile 64000 > >> > hdfs - nproc 32768 > >> > hdfs - stack 10240 > >> > hbase - nproc 32768 > >> > hbase - stack 10240 > >> > mapred - nproc 32768 > >> > mapred - stack 10240 > >> > > >> > > >> > Sumanth > >> > > >> > > >> > > >> > On Thu, Feb 16, 2012 at 6:48 PM, Sree K <[email protected]> wrote: > >> > > >> >> Sumanth, > >> >> > >> >> You may want to check ulimit setting for open files. > >> >> > >> >> > >> >> Set it to a higher value if it is at default value of 1024. > >> >> > >> >> Regards, > >> >> Sreedhar > >> >> > >> >> > >> >> > >> >> > >> >> ________________________________ > >> >> From: Sumanth V <[email protected]> > >> >> To: [email protected] > >> >> Sent: Thursday, February 16, 2012 6:25 PM > >> >> Subject: ENOENT: No such file or directory > >> >> > >> >> Hi, > >> >> > >> >> We have a 20 node hadoop cluster running CDH3 U2. Some of our jobs > >> >> are failing with the following errors. We noticed that we are > >> >> consistently hitting this error condition when the total number of > map > >> >> tasks in a particular job exceeds the total map task capacity of the > >> >> cluster. > >> >> Other jobs where the number of map tasks are lower than the total map > >> task > >> >> capacity fares well. > >> >> > >> >> Here are the lines from Job Tracker log file - > >> >> > >> >> 2012-02-16 15:05:28,695 INFO org.apache.hadoop.mapred.TaskInProgress: > >> >> Error from attempt_201202161408_0004_m_000169_0: ENOENT: No such > file or > >> >> directory > >> >> at org.apache.hadoop.io.nativeio.NativeIO.open(Native Method) > >> >> > >> >> at > org.apache.hadoop.io.SecureIOUtils.createForWrite(SecureIOUtils.java: > >> >> 172) > >> >> > >> >> at > org.apache.hadoop.mapred.TaskLog.writeToIndexFile(TaskLog.java:215) > >> >> at > org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:288) > >> >> at org.apache.hadoop.mapred.Child.main(Child.java:245) > >> >> > >> >> Here is the task tracker log - > >> >> > >> >> 2012-02-16 15:05:22,126 INFO org.apache.hadoop.mapred.JvmManager: > JVM : > >> >> jvm_201202161408_0004_m_1467721896 exited with exit code 0. Number of > >> tasks > >> >> it ran: 1 > >> >> 2012-02-16 15:05:22,127 WARN > org.apache.hadoop.mapred.TaskLogsTruncater: > >> >> Exception in truncateLogs while getting allLogsFileDetails(). > Ignoring > >> the > >> >> truncation of logs of this process. > >> >> java.io.FileNotFoundException: > >> >> /usr/lib/hadoop-0.20/logs/userlogs/ > >> >> job_201202161408_0004/attempt_201202161408_0004_m_000112_1/log.index > >> >> (No > >> >> such file or directory) > >> >> at java.io.FileInputStream.open(Native Method) > >> >> at java.io.FileInputStream.<init>(FileInputStream.java:120) > >> >> at java.io.FileReader.<init>(FileReader.java:55) > >> >> > >> >> at > org.apache.hadoop.mapred.TaskLog.getAllLogsFileDetails(TaskLog.java: > >> >> 110) > >> >> > >> >> at > >> >> > >> > org.apache.hadoop.mapred.TaskLogsTruncater.getAllLogsFileDetails(TaskLogsTr > >> >> uncater.java: 353) > >> >> > >> >> at > >> >> > >> > org.apache.hadoop.mapred.TaskLogsTruncater.shouldTruncateLogs(TaskLogsTrunc > >> >> ater.java: 98) > >> >> > >> >> at > >> >> > >> > org.apache.hadoop.mapreduce.server.tasktracker.userlogs.UserLogManager.doJv > >> >> mFinishedAction(UserLogManager.java: 163) > >> >> > >> >> at > >> >> > >> > org.apache.hadoop.mapreduce.server.tasktracker.userlogs.UserLogManager.proc > >> >> essEvent(UserLogManager.java: 137) > >> >> > >> >> at > >> >> > >> > org.apache.hadoop.mapreduce.server.tasktracker.userlogs.UserLogManager.moni > >> >> tor(UserLogManager.java: 132) > >> >> > >> >> at > >> org.apache.hadoop.mapreduce.server.tasktracker.userlogs.UserLogManager > >> >> $1.run(UserLogManager.java:66) > >> >> 2012-02-16 15:05:22,228 INFO > >> >> org.apache.hadoop.mapred.TaskTracker: > >> attempt_201202161408_0004_m_000006_0 > >> >> 0.0% > >> >> 2012-02-16 15:05:22,228 INFO > >> >> org.apache.hadoop.mapred.TaskTracker: > >> attempt_201202161408_0004_m_000053_0 > >> >> 0.0% > >> >> 2012-02-16 15:05:22,329 INFO > >> >> org.apache.hadoop.mapred.TaskTracker: > >> attempt_201202161408_0004_m_000057_0 > >> >> 0.0% > >> >> > >> >> Any help in resolving this issue would be highly appreciated! Let me > >> >> know if any other config info is needed. > >> >> > >> >> Thanks, > >> >> Sumanth > >> >> > >> > >> > >> > >> -- > >> -- Srinivas > >> [email protected] > >> > > > > -- > -- Srinivas > [email protected] >
