I am part of a working group that is developing a Bigtable-like structured
storage system for Hadoop HDFS (see
http://wiki.apache.org/lucene-hadoop/Hbase).

I am interested in learning about large HDFS installations:

- How many nodes do you have in a cluster?

- How much data do you store in HDFS?

- How many files do you have in HDFS?

- Have you run into any limitations that have prevented you from growing
  your application?

- Are there limitations in how many files you can put in a single directory?

  Google's GFS, for example does not really implement directories per-se,
  so it does not suffer from performance problems related to having too
  many files in a directory as traditional file systems do.

The largest system I know about has about 1.5M files and about 150GB of
data. If anyone has a larger system in use, I'd really like to hear from
you. Were there particular obstacles you had in growing your system to that
size, etc?

Thanks in advance.
-- 
Jim Kellerman, Senior Engineer; Powerset                [EMAIL PROTECTED]


Reply via email to