Rod Taylor wrote:
The machine the namenode is running on does have very high load at
times. Do you recommend a separate box for the namenode and jobtracker
which runs strictly those items?
That would be optimal, but it shouldn't be required. If a tasktracker
or datanode is sluggish then its impact is small, but if the jobtracker
or namenode become sluggish the impact is systemic. That said, so long
as these don't crash, things should work. The problem is that the code
paths for recovery when namenodes and jobtrackers are sluggish have not
been tested as much.
What's in the jobtracker logs around this time? Did it report this
tasktracker as lost?
The jobtracker did not indicate such a thing (via an exception anyway).
Tasktracker connections seem to be established and disconnected from
fairly frequently. Perhaps this is what you mean?
No, there's a "lost tracker" message when the jobtracker times out a
tasktracker. These are bad, since the jobtracker then assumes that all
of the temporary map data at that tasktracker is gone, and re-schedules
those map tasks.
Doug