A short, slightly off-topic question: > Also note that in this configuration that one cannot take > advantage of the "keep the machine up at all costs" features in newer > Hadoop's, which require that root, swap, and the log area be mirrored > to be truly effective. I'm not quite convinced that those features are > worth it yet for anything smaller than maybe a 12 disk config.
Dell and Cloudera promote the C2100. I'd like to see the calculations behind that config. Am I wrong thinking that keeping your cluster up with such dense nodes will only work if you have many (order of magnitude 100+) of them, and interconnected with 10Gb Ethernet? If you don't then recovery times from failing disks / rack switches are going to get crazy, right? If you want to get bang for buck, don't the proportions "disk IO / processing power", "node storage capacity / ethernet speed" and "total amount of nodes / ethernet speed", indicate many small nodes with not too many disks and 1Gb Ethernet? Cheers, Evert
