On 4/10/08 4:42 AM, "Todd Troxell" <[EMAIL PROTECTED]> wrote: > Hello list,
Howdy. > I am interested in using HDFS for storage, and for map/reduce only > tangentially. I see clusters mentioned in the docs with many many nodes and > 9TB of disk. > > Is HDFS expected to scale to > 100TB? We're running file systems in the 2-6PB range. > Does it require massive parallelism to scale to many files? For instance, do > you think it would slow down drastically in a 2 node 32T config? The biggest gotcha is the name node. You need to feed it lots and lots of memory. Keep in mind that Hadoop functions better with fewer large files than many small ones.