My organization has decided to make a substantial investment in hardware for processing Hadoop jobs. Our cluster will be used by multiple groups so its hard to classify the problems as IO, memory, or CPU bound. Would others be willing to share their hardware profiles coupled with the problem types (memory, cpu, etc.). Our current setup, for the existing cluster is made up of the following machines,
Poweredge 1655 2x2 Intel Xeon 1.4ghz 2GB RAM 72GB local HD Poweredge 1855 2x2 Intel Xeon 3.2ghz 8GB RAM 146GB local HD Poweredge 1955 2x2 Intel Xeon 3.0ghz 4GB RAM 72GB local HD Obviously, we would like to increase local disk space, memory, and the number of cores. The not-so-obvious decision is wether to select high end equipment (fewer machines) or lower-class hardware. We're trying to balance "how commodity" against the administration costs. I've read the machine scaling material on the Hadoop wiki. Any additional real-world advice would be awesome. Thanks, Justin
