Hadoop HDFS + ZFS (RE: Some Doubts of hadoop functionality)

Greg Connor Sat, 05 Jan 2008 00:34:38 -0800

Hi folks, I'm still a noob in the hadoop world, so I apologize if this is 
already asked and answered.  This thread seems pretty recent, so hopefully it's 
OK if I jump in.  I trust folks to politely correct me if I'm way off base.  
(This is not really a question per se, but more a request for comments/feedback.

This question from M.Shiva spawned a discussion about the difference between 
replication, and backup/restore.

> On 12/19/07 11:17 PM, "M.Shiva" wrote:

>> > 5.Can we take backup and restore the files written to hadoop

>On 12/20/07 12:05 AM, "Ted Dunning" wrote:
> Obviously, yes.
>
> But, again, the point of hadoop's file system is that it makes this largely
> unnecessary because of file replication.

> From: Joydeep Sen Sarma, Thu, 20 Dec 2007 08:53:38 -0800
>
> agreed - i think for anyone who is thinking of using hadoop as a place from
> where data is served - has to be disturbed by lack of data protection.
>
> replication in hadoop provides protection against hardware failures. not
> software failures. backups (and depending on how they are implemented -
> snapshots) protect against errant software. we have seen evidence of the
> namenode going haywire and causing block deletions/file corruptions at least
> once. we have seen more reports of the same nature on this list. i don't think
> hadoop (and hbase) can reach their full potential without a safeguard against
> software corruptions.
>
> (i don't think the traditional notion of backing up to tape (or even virtual
> tape - which is really what our filers are becoming) is worth discussing. for
> large data sets - the restore time would be so bad as to render these useless
> as a recovery path).

I think both answers are right, in that replication provides protection against 
most "normal" failures and that certain not-unheard-of events can still cause 
catastrophic data loss.  These might include software problems, multiple 
simultaneous failures of nodes, or a malicious insider deciding to rm files.

I would agree with Joydeep that it's a little disturbing.  In its current 
design hadoop's dfs is probably not well-suited to applications where losing 
the data would mean losing your business, or even significant revenue.  Levels 
of data protection beyond just replication and checksums are probably not even 
in the original design goals.  After all it's originally about distributed 
computing, right?

There are some types of files that I don't care if I lose, and there are others 
for which replication level 4 would still not be enough.  After all, if I lose 
power to a datacenter and I'm using older disks, it's probably not unexpected 
that a dozen or more disks wouldn't come back, and if the cluster is 
well-balanced *some* number of my important blocks will lose all replicas.

At the other end of the spectrum, there are also some files for which I'd like 
some very basic level of replication that's not 2x or 3x the cost of no 
replication at all.  If a GB of disk costs me $0.30 (0.50 if you count the 
servers, switches, etc), then having replication set to 2 costs me 1.00 and 
replication level 3 costs me 1.50.  But, what if every 5th block was a parity 
block that could be used to reconstruct any one of 4 other blocks?  That means 
I'm still protected against losing 1 node, but at a total cost of 0.60 instead 
of 1.00.  (Some smart programmer might even write a map/reduce program to 
replace the failed blocks quickly :)  Losing 2 or more nodes at once would 
still mean a chance that some blocks will lose a replica and its companion, or 
a replica and its parity, but perhaps that level of risk is acceptable for some 
applications.

> this question came up a couple of days back as well. one option is switching
> over to solaris+zfs as a way of taking data snapshots. the other option is
> having two hdfs instances (ideally running different versions) and replicating
> data amongst them. both have clear downsides.

Since you mentioned ZFS, I went and looked at it today, and it definitely is 
all kinds of cool.  ZFS is an excellent example of a robust, feature-rich 
filesystem, at least if it does what it's documentation claims it does.  It 
looks like ZFS can take arbitrarily large numbers of disks and combine them 
into huge storage, for large numbers of huge files, and it also has 
checksumming built in.  It also has replication (ditto blocks), snapshots and 
clones and other sexy features like RAID-Z (which I'm calling "Replication 
level 1.2")

(A lot has been written about it but I found the wikipedia entry most useful 
for a quick overview: http://en.wikipedia.org/wiki/ZFS)

The big thing I want it to have and it doesn't is "being distributed".  It can 
handle many disks (theoretically billions) as long as they're attached to the 
same Solaris kernel (yeah right).  Meanwhile, Hadoop *is* distributed, and is 
really great at moving large numbers of large blocks around among large numbers 
of nodes.

So I'm thinking, we really need to get these two together... I think they would 
get along famously.  I would really *love* to see Hadoop pick up some of the 
same features, especially snapshot/clone and parity blocks.  I'm guessing it 
won't do so in the near future, but hopefully some other product will come 
along soon that does for the distributed-storage world what ZFS does for the 
single machine.

Hadoop HDFS + ZFS (RE: Some Doubts of hadoop functionality)

Reply via email to