PVFS is going to be a lot more stable, performant and mature. Hadoop
is pretty young and growing fast.
PVFS's design point is really more of a caching layer than a
permanent file system. It does not really have the mechanisms to
deal with the kind of regular hardware failure we are aiming at
handling with Hadoop. So if you are using a large number of
inexpensive machines, data durability will be a problem with PVFS.
That said, today it is hard to argue that HDFS can provide any
reliable data durability either. In a few months I think it will
have a serious advantage in this regard. Right now we think of it
more as a caching layer as well.
One thing to point out is that we use a lot of nodes (600+). On
small clusters, these concerns are minor.
On Apr 22, 2006, at 7:51 AM, Chris Mattmann wrote:
Hi Folks,
Does anyone have any comparisons to Hadoop and PVFS? Even if they
aren't
empirical, any advice on what the advantages/disadvantages of one
versus the
other would be great. We're evaluating Hadoop right now at my job,
and I
think it's really great, and easy to understand, however I'm not an
OS guy,
so I don't know much about distributed file systems, other than the
30,000
ft. picture. I'm also getting a lot of push at my job to look at
PVFS (since
they have many releases, are at version 2.0, etc.).
Any ideas?
Thanks!
Cheers,
Chris