[jira] Commented: (HADOOP-3196) get rid of excessive flushes from PipeMapper/Reducer

Owen O'Malley (JIRA) Wed, 09 Apr 2008 01:09:17 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-3196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12587084#action_12587084
 ]


Owen O'Malley commented on HADOOP-3196:
---------------------------------------

Actually, there were so many problems with tracking progress on streaming that 
it was disabled by default. 

{code}
    // All streaming jobs have, by default, no time-out for tasks
    jobConf_.setLong("mapred.task.timeout", 0);
{code}

I think it is ok to remove the flushes. The only downside that I can see is if 
you have your input format doing slow low-volumne fetching, you'll increase the 
latency of the map.

> get rid of excessive flushes from PipeMapper/Reducer
> ----------------------------------------------------
>
>                 Key: HADOOP-3196
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3196
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/streaming
>    Affects Versions: 0.16.2
>            Reporter: Joydeep Sen Sarma
>
> there's a flush on the buffered output streams in mapper/reducer for every 
> row of data.
>       // 2/4 Hadoop to Tool                                                   
>                                                                 
>       if (numExceptions_ == 0) {
>         if (!this.ignoreKey) {
>           write(key);
>           clientOut_.write('\t');
>         }
>         write(value);
>         if(!this.skipNewline) {
>             clientOut_.write('\n');
>         }
>         clientOut_.flush();
>       } else {
>         numRecSkipped_++;
>       }
> tried to measure impact of removing this. number of context switches reported 
> by vmstat shows marked decline. 
> with flush (10 second intervals):
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
>  4  2    784  23140  83352 3114648    0    0  4819 32397 1175 13220 59 11 13 
> 17
>  1  2    784 129724  80704 3075696    0    0  4614 27196 1156 14797 49 11 19 
> 21
>  4  0    784  24160  83440 3174880    0    0    96 36070 1337 10976 67 11  9 
> 12
>  5  0    784 155872  84400 3158840    0    0   125 44084 1280 11044 68 14 10  
> 8
>  2  1    784 365128  87048 2892032    0    0   119 38472 1317 11610 69 14 10  
> 7
> without flush:
>  5  0    784  24652  56056 3217864    0    0   310 29499 1379  7603 76  9  7  
> 8
>  5  3    784 118456  54568 3209992    0    0  3249 33426 1173  6828 63 11 12 
> 14
>  0  2    784 227628  54820 3198560    0    0  7840 30063 1146  8899 60 10 15 
> 15
>  3  1    784  25608  55048 3313512    0    0  3251 36276 1194  7915 60 10 15 
> 15
>  1  2    784 197324  49968 3194572    0    0  4714 35479 1281  8204 62 13 12 
> 13
> cs goes down by about 20-30%. but having trouble measuring overall speed 
> improvement (too many variables due to spec. execution etc. - need better 
> benchmark).
> can't hurt.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3196) get rid of excessive flushes from PipeMapper/Reducer

Reply via email to