[
https://issues.apache.org/jira/browse/HADOOP-4620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12653642#action_12653642
]
Ruyue Ma commented on HADOOP-4620:
----------------------------------
if mapper has no data to handle. We can doesn't start native process. This will
improve performance and avoid this problem.
so i suggest that it should start the native process in PiperMap->map().
we can move PipeMapRed part code to map function
// Start the process
ProcessBuilder builder = new ProcessBuilder(argvSplit);
builder.environment().putAll(childEnv.toMap());
sim = builder.start();
clientOut_ = new DataOutputStream(new BufferedOutputStream(
sim.getOutputStream(),
BUFFER_SIZE));
clientIn_ = new DataInputStream(new BufferedInputStream(
sim.getInputStream(),
BUFFER_SIZE));
clientErr_ = new DataInputStream(new
BufferedInputStream(sim.getErrorStream()));
startTime_ = System.currentTimeMillis();
errThread_ = new MRErrorThread();
errThread_.start();
> Streaming mapper never completes if the mapper does not write to stdout
> -----------------------------------------------------------------------
>
> Key: HADOOP-4620
> URL: https://issues.apache.org/jira/browse/HADOOP-4620
> Project: Hadoop Core
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.17.2
> Reporter: Runping Qi
> Assignee: Ravi Gummadi
>
> A mapper of a streaming job has empty input data and thus it produces no
> output.
> The task never completes.
> The following are the last two lines from the task log:
> 2008-11-07 21:59:48,254 INFO org.apache.hadoop.streaming.PipeMapRed:
> PipeMapRed exec [/usr/bin/perl, xxx]
> 2008-11-07 21:59:48,330 INFO org.apache.hadoop.streaming.PipeMapRed:
> mapRedFinished
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.