Lohit is right.
File creation will be slow if all 100,000 files are in one directory.
Directory entries are implemented as a sorted array (ArrayList),
which optimizes lookup (binary search) in the table, but makes entry
insertion inefficient because it requires shifting all entries to the left
of the insert point.
This should be fixed at some point. For now if you don't mind the create
performance (which is still negligible compared to big file writes)
you can use large directories, otherwise (lots of small files) split them.
--Konstantin

Lohit wrote:
Last when I tried to load an image with lots of files in same directory, it was 
like ten times slow. This is to do with the data structures. My numbers were 
million though. Try to have a directory structure.

Lohit

On Sep 17, 2008, at 11:57 AM, Nathan Marz <[EMAIL PROTECTED]> wrote:

Hello all,

Is it bad to have a lot of files in a single HDFS directory (aka, on the order 
of hundreds of thousands)? Or should we split our files into a directory 
structure of some sort?

Thanks,
Nathan Marz


Reply via email to