Re: Bugs in 0.16.0?

Owen O'Malley Sat, 01 Mar 2008 13:33:55 -0800


On Mar 1, 2008, at 12:05 PM, Amar Kamat wrote:

3) Lastly, it would seem beneficial for jobs that have significantstartup overhead and memory requirements to not be run in separateJVMs for each task. Along these lines, it looks like someonesubmitted a patch for JVM-reuse a while back, but it wasn'tcommited? https://issues.apache.org/jira/browse/HADOOP-249

Most of the ideas in the patch for 249 were committed as otherpatches, but that bug has been left open precisely because the ideastill has merit. The patch was never stable enough to commit and nowis hopelessly out of date. There are lots of little issues that wouldneed to be addressed for this to happen.

Probably a question for the dev mailing list, but if I wanted tomodify hadoop to allow threading tasks, rather than runningindependent JVMs, is there any reason someone hasn't done thisyet? Or am I overlooking something?
This is done to keep user code separate from the framework code.

Precisely. We don't want to go through the security manager in theservers, so it is far easier to keep user code out of the servers.

So if the user code develops a fault the framework and rest of thejobs function normally. Most of the jobs have a longer run time andhence the startup time is never a concern.

As long as the tasks belong to the same job (and therefore user),sharing a jvm should be fine. One concern is that currently each taskgets its own working directory. Since Java can't change workingdirectory in a running process, it would have to clean up the workingdirectory. That will interact badly with debugging settings that letyou keep the task files. However, as we speed things up, it willbecome more important. Already we are starting to see sort maps thatfinish in 17 seconds, which means the 1 second of jvm startup is a5% overhead...


-- Owen

Re: Bugs in 0.16.0?

Reply via email to