I am part of a working group that is developing a Bigtable-like structured storage system for Hadoop HDFS (see http://wiki.apache.org/lucene-hadoop/Hbase).
I am interested in learning about large HDFS installations: - How many nodes do you have in a cluster? - How much data do you store in HDFS? - How many files do you have in HDFS? - Have you run into any limitations that have prevented you from growing your application? - Are there limitations in how many files you can put in a single directory? Google's GFS, for example does not really implement directories per-se, so it does not suffer from performance problems related to having too many files in a directory as traditional file systems do. The largest system I know about has about 1.5M files and about 150GB of data. If anyone has a larger system in use, I'd really like to hear from you. Were there particular obstacles you had in growing your system to that size, etc? Thanks in advance. -- Jim Kellerman, Senior Engineer; Powerset [EMAIL PROTECTED]