Good point Todd. I was speaking from the experience of people I know who are using 0.20.x
On Thu, Jun 30, 2011 at 5:24 PM, Todd Lipcon <[email protected]> wrote: > On Thu, Jun 30, 2011 at 5:16 PM, Ted Dunning <[email protected]> > wrote: > > > You have to consider the long-term reliability as well. > > > > Losing an entire set of 10 or 12 disks at once makes the overall > > reliability > > of a large cluster very suspect. This is because it becomes entirely too > > likely that two additional drives will fail before the data on the > off-line > > node can be replicated. For 100 nodes, that can decrease the average > time > > to data loss down to less than a year. This can only be mitigated in > stock > > hadoop by keeping the number of drives relatively low. MapR avoids this > by > > not failing nodes for trivial problems. > > > > I'd advise you to look at "stock hadoop" again. This used to be true, but > was fixed a long while back by HDFS-457 and several followup JIRAs. > > If MapR does something fancier, I'm sure we'd be interested to hear about > it > so we can compare the approaches. >
