I am new to this group, and relatively new to hadoop. I am looking at building a large cluster. I was wondering if anyone has any best practices for a cluster in the hundreds of nodes? As well, has anyone had experience with a cluster spanning multiple data centers. Is this a bad practice? moderately bad practice? insane?
Is it better to build the 1000 node cluster in a single data center? Do you back one of these things up to a second data center or a different 1000 node cluster? Sorry, I am asking crazy questions...I am just wanting to learn the meta issues and opportunities with making clusters. Thanks for your ideas! Cheers James.
