On Sep 29, 2011, at 1:50 AM, praveenesh kumar wrote: > Hi, > > I want to know can we use SAN storage for Hadoop cluster setup ? > If yes, what should be the best pratices ? > > Is it a good way to do considering the fact "the underlining power of Hadoop > is co-locating the processing power (CPU) with the data storage and thus it > must be local storage to be effective". > *But also, is it better to say “local is better” in the situation where I > have a single local 5400 RPM IDE drive, which would be dramatically slower > than SAN storage striped across many drives spinning at 10k RPM and > accessed via fiber channel ?*
Hi Praveenesh, Two things: 1) If the option is a single 5400 RPM IDE drive (you can still buy those?) versus high-end SAN, the high-end SAN is going to win. That's often false comparison: the question is often "What can I buy for $50k?". In that case (setting aside organizational politics), you can buy more spindles in the "traditional" Hadoop setup than for the SAN. - Also, if you're latency limited, you're likely working against yourself. The best thing I ever did for my organization was make our software work just as well with 100ms latency as with 1ms latency. 2) As Paul pointed out, you have to ask yourself whether the SAN is shared or dedicated. Many SANs don't have the ability to strongly partition workloads between users.. Brian
