[
https://issues.apache.org/jira/browse/MAPREDUCE-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831644#action_12831644
]
Todd Lipcon commented on MAPREDUCE-1266:
----------------------------------------
bq. if you are using jvm reuse, then that 1s disappears, right?
Not really, since JVM reuse doesn't reuse between maps and reduces.
The time sequence of a small job looks like:
Client:
Submit job
JT:
Create tasks ("initialize job") on JT
wait for a TT to heartbeat
TT:
start JVM
child:
process map task
TT:
send accelerated heartbeat once map task is complete (I forget whether this
is in 0.20 or came later)
receive reduce task, start reduce JVM (regardless of JVM reuse)
child:
process reduce task
TT:
send completion heartbeat
I guess there are also some setup/cleanup tasks going on in there as well.
Since we're talking about a hypothetical one map, one reduce, we're just
cutting down the time between initting the job and getting the first JVM on a
TT.
In a multimapper or multireducer job, the cost shows up in how long it takes
for all of the tasks to get scheduled - it will only schedule one task per
heartbeat with some schedulers. The fair scheduler after MAPREDUCE-706 can
assign multiple at the same time, which should help substantially.
> Allow heartbeat interval smaller than 3 seconds for tiny clusters
> -----------------------------------------------------------------
>
> Key: MAPREDUCE-1266
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1266
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: jobtracker, task, tasktracker
> Affects Versions: 0.22.0
> Reporter: Todd Lipcon
> Priority: Minor
>
> For small clusters, the heartbeat interval has a large effect on job latency.
> This is especially true on pseudo-distributed or other "tiny" (<5 nodes)
> clusters. It's not a big deal for production, but new users would have a
> happier first experience if Hadoop seemed snappier.
> I'd like to change the minimum heartbeat interval from 3.0 seconds to perhaps
> 0.5 seconds (but have it governed by an undocumented config parameter in case
> people don't like this change). The cluster size-based ramp up of interval
> will maintain the current scalable behavior for large clusters with no
> negative effect.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.