Hi Vincent, Maybe you'd be better off using something like MogileFS for this application?
-Todd On Tue, Oct 11, 2011 at 1:39 AM, Vincent Boucher <vin.bouc...@gmail.com> wrote: > Hello again, > > * Our case: > > Most of the files we are dealing with are 10GB wide. Our hdfs configuration > would be the following: data is stored on mass storage servers (10x50TB) each > with RAID6; no replica for data. > > With a 64MB hdfs block size, it is extremely likely that all of our 10GB files > will be spread over all the mass storage servers. Consequently, having one of > these servers down/dead will corrupt the full filesystem (all the 10GB > files). Not great. > > Opting for bigger blocks (blocks of 12.5GB [= 200x64MB]) will reduce the > spread: the file contents will be stored on a single server. Having one > server down/dead will corrupt only 10% of the files in the filesystem (since > there are 10 servers). That's much easier to regenerate/re-download from > other Tiers than doing it for the full filesystem, as in the case of the 64MB > blocks. > > > * Questions: > > Is hdfs suitable with huge block size (12.5GB)? > > Do you have experience with hdfs with such block size? > > > Cheers, > > Vincent > -- Todd Lipcon Software Engineer, Cloudera