I've run into a situation where it would be helpful to set specific
configuration variables local to a data/task node. I've got a solution, but
I'm curious if there is a best practice around this and if I'm doing it in a
reasonable way.
Basically what we've got is a number of machines that have 2 Cores/4GB of
memory. For those boxes we have some options configured higher than the
default in mapred-site.xml ( specifically mapred.child.java.opts set to
-Xmx1024m ). We recently added a few additional boxes that have 2 Cores/2GB of
memory and the mapred.child.java.opts causes those boxes to swap so we'd like
to configure those boxes to set mapred.child.java.opts to -Xmx512m. What I
found is that if on the data/task node that if I change the value in
mapred-site.xml it is overridden, but if I set the parameter using
<final>true</final> it is used. So effectively now I've got a different
mapred-site.xml on each of the nodes.
My question is, is this a reasonable way of going about this? Is there a
best practice for dealing with minor node-specific configuration differences?
What I'd really want is not only a mapred-site.xml, but a mapred-node.xml as
well that has node specific overrides. What I can't tell is if I make all the
site level configuration changes in the name node mapred-site.xml and then the
node specific mapred-site.xml files to node local changes if that does what I'm
looking for.
Any insight would be appreciated.
Andy