Re: Is SAN storage is a good option for Hadoop ?

Brian Bockelman Thu, 29 Sep 2011 05:29:36 -0700

On Sep 29, 2011, at 1:50 AM, praveenesh kumar wrote:

> Hi,
> 
> I want to know can we use SAN storage for Hadoop cluster setup ?
> If yes, what should be the best pratices ?
> 
> Is it a good way to do considering the fact "the underlining power of Hadoop
> is co-locating the processing power (CPU) with the data storage and thus it
> must be local storage to be effective".
> *But also, is it better to say “local is better” in the situation where I
> have a single local 5400 RPM IDE drive, which  would be dramatically slower
> than SAN storage striped  across many drives spinning at 10k RPM and
> accessed via fiber channel ?*


Hi Praveenesh,

Two things:
1) If the option is a single 5400 RPM IDE drive (you can still buy those?) 
versus high-end SAN, the high-end SAN is going to win.  That's often false 
comparison: the question is often "What can I buy for $50k?".  In that case 
(setting aside organizational politics), you can buy more spindles in the 
"traditional" Hadoop setup than for the SAN.
  - Also, if you're latency limited, you're likely working against yourself.  
The best thing I ever did for my organization was make our software work just 
as well with 100ms latency as with 1ms latency.
2) As Paul pointed out, you have to ask yourself whether the SAN is shared or 
dedicated.  Many SANs don't have the ability to strongly partition workloads 
between users..

Brian

Re: Is SAN storage is a good option for Hadoop ?

Reply via email to