No task may execute due to an Integer overflow possibility
----------------------------------------------------------
Key: MAPREDUCE-2236
URL: https://issues.apache.org/jira/browse/MAPREDUCE-2236
Project: Hadoop Map/Reduce
Issue Type: Bug
Affects Versions: 0.20.2
Environment: Linux, Hadoop 0.20.2
Reporter: Harsh J Chouraria
Assignee: Harsh J Chouraria
Priority: Critical
Fix For: 0.23.0
If the attempts is configured to use Integer.MAX_VALUE, an overflow occurs
inside TaskInProgress, and thereby no task is attempted by the cluster and the
map tasks stay in pending state forever.
For example, here's a job driver that causes this:
{code}
import java.io.IOException;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.lib.IdentityMapper;
import org.apache.hadoop.mapred.lib.NullOutputFormat;
@SuppressWarnings("deprecation")
public class IntegerOverflow {
/**
* @param args
* @throws IOException
*/
@SuppressWarnings("deprecation")
public static void main(String[] args) throws IOException {
JobConf conf = new JobConf();
Path inputPath = new Path("ignore");
FileSystem fs = FileSystem.get(conf);
if (!fs.exists(inputPath)) {
FSDataOutputStream out = fs.create(inputPath);
out.writeChars("Test");
out.close();
}
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(NullOutputFormat.class);
FileInputFormat.addInputPath(conf, inputPath);
conf.setMapperClass(IdentityMapper.class);
conf.setNumMapTasks(1);
// Problem inducing line follows.
conf.setMaxMapAttempts(Integer.MAX_VALUE);
// No reducer in this test, although setMaxReduceAttempts leads
to the same problem.
conf.setNumReduceTasks(0);
JobClient.runJob(conf);
}
}
{code}
The above code will not let any map task run. Additionally, a log would be
created inside JobTracker logs with the following information that clearly
shows the overflow:
{code}
2010-12-30 00:59:07,836 WARN org.apache.hadoop.mapred.TaskInProgress: Exceeded
limit of -2147483648 (plus 0 killed) attempts for the tip
'task_201012300058_0001_m_000000'
{code}
The issue lies inside the TaskInProgress class
(/o/a/h/mapred/TaskInProgress.java), at line 1018 (trunk), part of the
getTaskToRun(String taskTracker) method.
{code}
public Task getTaskToRun(String taskTracker) throws IOException {
// Create the 'taskid'; do not count the 'killed' tasks against the job!
TaskAttemptID taskid = null;
/* ============ THIS LINE v ====================================== */
if (nextTaskId < (MAX_TASK_EXECS + maxTaskAttempts + numKilledTasks)) {
/* ============ THIS LINE ^====================================== */
// Make sure that the attempts are unqiue across restarts
int attemptId = job.getNumRestarts() * NUM_ATTEMPTS_PER_RESTART +
nextTaskId;
taskid = new TaskAttemptID( id, attemptId);
++nextTaskId;
} else {
LOG.warn("Exceeded limit of " + (MAX_TASK_EXECS + maxTaskAttempts) +
" (plus " + numKilledTasks + " killed)" +
" attempts for the tip '" + getTIPId() + "'");
return null;
}
{code}
Since all three variables being added are integer in type, one of them being
Integer.MAX_VALUE makes the condition fail with an overflow, thereby logging
and returning a null as the result is negative.
One solution would be to make one of these variables into a long, so the
addition does not overflow?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.