PVFS is going to be a lot more stable, performant and mature. Hadoop is pretty young and growing fast.

PVFS's design point is really more of a caching layer than a permanent file system. It does not really have the mechanisms to deal with the kind of regular hardware failure we are aiming at handling with Hadoop. So if you are using a large number of inexpensive machines, data durability will be a problem with PVFS. That said, today it is hard to argue that HDFS can provide any reliable data durability either. In a few months I think it will have a serious advantage in this regard. Right now we think of it more as a caching layer as well.

One thing to point out is that we use a lot of nodes (600+). On small clusters, these concerns are minor.



On Apr 22, 2006, at 7:51 AM, Chris Mattmann wrote:

Hi Folks,

Does anyone have any comparisons to Hadoop and PVFS? Even if they aren't empirical, any advice on what the advantages/disadvantages of one versus the other would be great. We're evaluating Hadoop right now at my job, and I think it's really great, and easy to understand, however I'm not an OS guy, so I don't know much about distributed file systems, other than the 30,000 ft. picture. I'm also getting a lot of push at my job to look at PVFS (since
they have many releases, are at version 2.0, etc.).

  Any ideas?

Thanks!

Cheers,
   Chris





Reply via email to