Re: Running tasks in the TaskTracker VM

Stephane Bailliez Tue, 20 Mar 2007 06:06:15 -0800

Torsten Curdt wrote:

Being a complete idiot for distributed computing, I would say it iseasy to explode a JVM when doing such distributed jobs, (should it befor OOM or anything).
Then restrict what people can do - at least Google went that route.


I don't know what Google did on the specifics :)

If you want to do that with Java and restrict memory usage, cpu usageand descriptor access within each inVM instance. That's a considerableamount of work that likely implies writing a specific agent for the vm(or an agent for a specific vm that is, because it's pretty unlikelythat you will get the same results across vms), assuming that can thenreally be done at the classloader level for each task (which is prettyinsanely complex to me if you have to consider allocation done at theparent classloader level, etc..)

At least by forking a vm you can afford to get some reasonably boundcontrol over the resources usage (or at least memory) without bringingdown everything since a vm is already bound to some degrees.

Failing jobs are not exactly uncommon and running things in asandboxed environment with less risk for the tracker seems like aperfectly reasonable choice. So yeah, vm pooling certainly makesperfect sense for it
I am still not convinced - sorry
It's a bit like you would like to run JSPs in a separate JVM becausethey might take down the servlet container.

it is a bit too extreme in granularity. I think it is more about likerunning n different webapps within the same VM or not. So if one webappis resource hog, separating it would not harm the n-1 other applicationsand you would either create another server instance or move it away toanother node.

I know of environment with large number of nodes (not related to hadoop)where they also reboot a set of nodes daily to ensure that all machinesare really in working conditions (it's usually when the machine rebootsdue to failure or whatever that someone has to rush to it because someservice forgot to be registered or things like that, so doing thisperiodic check gives some people better ideas of their response time tofailure). That depends of operational procedures for sure.

I don't think it should be done in the spirit that everything is perfectin the perfect world because we know it is not like that. So there willbe compromise between safety and performance and having somethingreasonably tolerant to failure is also a performance advantage.

Doing simple things in a task like a deleteOnExit is enough to leak onsome VMs a few kbs each time and stay there until the vm dies (fixed in1.5.0_10 if I remember well). Figuring out things like that in the endis likely to take a severe amount of time considering it is an internalleak and will not appear in your favorite java profiler either.

Bottom line is that even if you're 100% sure of your code which is quiteunlikely (at least for me as far as I'm concerned ), you don't knowthird-party code. So without being totally paranoid, this is somethingthat cannot be ignored.


-- stephane

Re: Running tasks in the TaskTracker VM

Reply via email to