Re: Hadoop Java Versions

Steve Loughran Fri, 01 Jul 2011 09:13:37 -0700

On 01/07/2011 01:16, Ted Dunning wrote:

You have to consider the long-term reliability as well.


Losing an entire set of 10 or 12 disks at once makes the overall reliability
of a large cluster very suspect.  This is because it becomes entirely too
likely that two additional drives will fail before the data on the off-line
node can be replicated.  For 100 nodes, that can decrease the average time
to data loss down to less than a year.

There's also Rodrigo's work on alternate block placement that doesn'tscatter blocks quite so randomly across a cluster, so a loss of a nodeor rack doesn't have adverse effects on so many files


https://issues.apache.org/jira/browse/HDFS-1094

Given that most HDDs failures happen on cluster reboot, it is possiblefor 10-12 disks not to come up at the same time, if the cluster has beenup for a while, but like Todd says -worry. At least a bit.

I've heard hints of one FS that actually includes HDD batch data inblock placement, to try and scatter data across batches, and be biasedtowards using new HDDs for temp storage during burn-in. Some researchwork on doing that to HDFS could be something to keep some postgraduatebusy for a while, "Disk batch-aware block placement".

This can only be mitigated in stock
hadoop by keeping the number of drives relatively low.

now I'm confused. Do you mean #of HDDs/server, or HDDs/filesystem?Because it seems to me that "stock" HDFS's use in production makes itone of the filesystems in the planet with the most number of non-RAIDedHDDs out there -things like Lustre and IBM GPFS go for RAID, as does HPIBRIX (the last two of which have some form of Hadoop support too, ifyou ask nicely). HDD/server numbers matter in that in a small cluster,it's better to have fewer machines to get more servers to spread thedata over; you don't really want your 100 TB in three 1U servers. Asyour cluster grows -and you care more about storage capacity than rawcompute- then the appeal of 24+ TB/server starts to look good, andthat's when you care about the improvements to datanodes handling lossof worker disk better. Even without that, rebooting the DN may fixthings, but the impact on ongoing work is the big issue -you don't justlose a replicated block, you lose data.

Cascade failures leading to cluster outages are a separate issue andnormally triggered by switch failure/config than anything else. Itdoesn't matter how reliable the hardware is if it gets the wrongconfiguration

Re: Hadoop Java Versions

Reply via email to