[
https://issues.apache.org/jira/browse/HADOOP-3196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12587084#action_12587084
]
Owen O'Malley commented on HADOOP-3196:
---------------------------------------
Actually, there were so many problems with tracking progress on streaming that
it was disabled by default.
{code}
// All streaming jobs have, by default, no time-out for tasks
jobConf_.setLong("mapred.task.timeout", 0);
{code}
I think it is ok to remove the flushes. The only downside that I can see is if
you have your input format doing slow low-volumne fetching, you'll increase the
latency of the map.
> get rid of excessive flushes from PipeMapper/Reducer
> ----------------------------------------------------
>
> Key: HADOOP-3196
> URL: https://issues.apache.org/jira/browse/HADOOP-3196
> Project: Hadoop Core
> Issue Type: Bug
> Components: contrib/streaming
> Affects Versions: 0.16.2
> Reporter: Joydeep Sen Sarma
>
> there's a flush on the buffered output streams in mapper/reducer for every
> row of data.
> // 2/4 Hadoop to Tool
>
> if (numExceptions_ == 0) {
> if (!this.ignoreKey) {
> write(key);
> clientOut_.write('\t');
> }
> write(value);
> if(!this.skipNewline) {
> clientOut_.write('\n');
> }
> clientOut_.flush();
> } else {
> numRecSkipped_++;
> }
> tried to measure impact of removing this. number of context switches reported
> by vmstat shows marked decline.
> with flush (10 second intervals):
> r b swpd free buff cache si so bi bo in cs us sy id wa
> 4 2 784 23140 83352 3114648 0 0 4819 32397 1175 13220 59 11 13
> 17
> 1 2 784 129724 80704 3075696 0 0 4614 27196 1156 14797 49 11 19
> 21
> 4 0 784 24160 83440 3174880 0 0 96 36070 1337 10976 67 11 9
> 12
> 5 0 784 155872 84400 3158840 0 0 125 44084 1280 11044 68 14 10
> 8
> 2 1 784 365128 87048 2892032 0 0 119 38472 1317 11610 69 14 10
> 7
> without flush:
> 5 0 784 24652 56056 3217864 0 0 310 29499 1379 7603 76 9 7
> 8
> 5 3 784 118456 54568 3209992 0 0 3249 33426 1173 6828 63 11 12
> 14
> 0 2 784 227628 54820 3198560 0 0 7840 30063 1146 8899 60 10 15
> 15
> 3 1 784 25608 55048 3313512 0 0 3251 36276 1194 7915 60 10 15
> 15
> 1 2 784 197324 49968 3194572 0 0 4714 35479 1281 8204 62 13 12
> 13
> cs goes down by about 20-30%. but having trouble measuring overall speed
> improvement (too many variables due to spec. execution etc. - need better
> benchmark).
> can't hurt.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.