Craig,
While HOD does not do this automatically, please note that since you
are bringing up a Map/Reduce cluster on the allocated nodes, you can
submit map/reduce parameters with which to bring up the cluster when
allocating jobs. The relevant options are
--gridservice-mapred.server-params (or -M in shorthand). Please refer to
http://hadoop.apache.org/core/docs/r0.19.0/hod_user_guide.html#Options+for+Configuring+Hadoop
for details.
I was aware of this, but the issue is that unless you obtain dedicated
nodes (as above), this option is not suitable, as it isn't set on a
per-node basis. I think it would be /fairly/ straightfoward to add to
HOD, as I detailed in my initial email, so that it "does the correct
thing" out the box.
True, I did assume you obtained dedicated nodes. It has been fairly
simpler to operate HOD in this manner, and if I understand correctly,
would help to solve the requirement you are having as well.
According to hadoop-default.xml, the number of maps is "Typically set
to a prime several times greater than number of available hosts." -
Say that we relax this recommendation to read "Typically set to a
NUMBER several times greater than number of available hosts" then it
should be straightforward for HOD to set it automatically then?
Actually, AFAIK, the number of maps for a job is determined more or less
exclusively by the M/R framework based on the number of splits. I've
seen messages on this list before about how the documentation for this
configuration item is misleading. So, this might actually not make a
difference at all, whatever is specified.
Thanks
Hemanth