Hi. I have small clusters (9 nodes) to run a hadoop here.
Under this cluster, a hadoop will take thousands of directories sequencely. In a each dir, there is two input files to m/r. Size of input files are from 1m to 5g bytes. In a summary, each hadoop job will take an one of these dirs. To get best performance, which strategy is proper for us? Could u suggest me about it? Which configuration is best? Ps) physical memory size is 12g of each node.
