Here at Veoh, we have committed to this style of file system in a very big way. We currently have around a billion files that we manage using replicated file storage.
We didn't go with HDFS for this, but the reasons probably do not apply in your case. In our case, we have lots (as in LOTS) of files, many of which are relatively small. The sheer number of files made it better for us to go with an alternative (MogileFS, heavily patched). That choice had issues as well since Mogile was not very well engineered for scaling to true web scale. We felt then, and I think that this is still true, that putting the effort into Mogile to stabilize it was a bit easier than putting the effort into Hadoop to scale it. We also run Hadoop for log processing and are very happy. Our experience with the replicated file store lifestyle has been excellent. Both Mogile and HDFS have been really excellent in terms of reliable file storage and nearly 100% uptime. For your application, I would be pretty confident that HDFS would be a very effective solution and would probably be much better than Mogile. The advantage over Mogile would largely be due to the fact that HDFS breaks files across servers so the quantum of file management would be much smaller for your application. Hiring Hadoop experienced engineers is still difficult, but we have found that it takes very little time for engineers to come up to speed on using Hadoop and HDFS concepts take much less time. With recent versions have file system level interfaces integrated, it should be even easier to manage these systems. We are beginning to see Hadoop experience on resumes, but I would guess that there will be a lag in Europe before you begin to see that. There will also be some bias in our favor because we have a relatively high profile as "cool" place to work. On 5/16/08 1:22 AM, "Robert Krüger" <[EMAIL PROTECTED]> wrote: > > Hi, > > I'm currently trying to make the case for using Hadoop (or more > precisely HDFS) as part of a storage architecture for a large media > asset repository. HDFS will be used for storing up to total of 1 PB of > high-resolution video (average file size will be > 1GB). We think HDFS > is a perfect match for these requirements but have to convince a rather > conservative IT executive who is worried about > > - Hadoop not being mature enough (reference customers or case studies > for similiar projects, e.g. TV stations running their video archives > using HDFS would probably help) > - Hadoop knowledge being not widely available should our customer choose > to change their hosting partner or systems integrator (a list of > consulting/hosting firms having Hadoop expertise would help) > > Thanks in advance for any pointers which help me make the case. > > Robert > > > >
