Where do I find information about which config parameters can be set as
per-node property, and which ones apply to all nodes? For example, I have a
cluster consisting of two classes of nodes. One class is dual-core 4GB memory
nodes, and the other class is 16-core 128GB memory nodes. It certainly makes
sense to configure them differently. So the questions is, which parameters I
should pay attention to? I vaguely know that probably at least the following
ones can be set as node-specific:
mapred.tasktracker.map.tasks.maximum
mapred.tasktracker.reduce.tasks.maximum
But anything beyond that? How about the following ones, can I set them as
node-specific parameters?
mapred.child.java.opts
tasktracker.http.threads
dfs.datanode.handler.count
io.sort.factor
io.sort.mb
mapred.inmem.merge.threshold
mapred.job.reduce.input.buffer.percent
Thanks!
Zhang