Hello,
  I'm looking at storing a large number of files under one directory. 

I started to break the files into subdirectories out of habit (from working on 
ntfs/etc), but it occurred to me that maybe (from a performance perspective), 
it doesn't really matter on hdfs.

Does it? Is there some recommended limit on the number of files to store in one 
directory on hdfs? I'm thinking thousands to millions, so we're not talking 
about INT_MAX or anything, but a lot.

Or is it only limited by my sanity :) ?

I suppose it would come down to the data structure(s) used by the namenode when 
tracking file metadata. But I don't know what those are - I did skim the HDFS 
architecture document, but didn't see anything conclusive.

Take care,
  -stu


      

Reply via email to