Thanks! I'll go with keeping my sanity then. The files will all be >= 64MB
Take care, -stu -----Original Message----- From: Allen Wittenauer <awittena...@linkedin.com> Date: Wed, 18 Aug 2010 01:00:42 To: <hdfs-user@hadoop.apache.org><hdfs-user@hadoop.apache.org> Reply-To: hdfs-user@hadoop.apache.org Subject: Re: Maximum number of files in directory? (in hdfs) On Aug 17, 2010, at 5:44 PM, Stuart Smith wrote: > I started to break the files into subdirectories out of habit (from working > on ntfs/etc), but it occurred to me that maybe (from a performance > perspective), it doesn't really matter on hdfs. > > Does it? Is there some recommended limit on the number of files to store in > one directory on hdfs? I'm thinking thousands to millions, so we're not > talking about INT_MAX or anything, but a lot. > > Or is it only limited by my sanity :) ? We have a directory with several thousand files in it. It is always a pain when we hit it because the client heap size needs to be increased to do anything in it: directory listings, web uis, distcp, etc, etc, etc. Doing any sort of manipulation in that dir is also slower. My recommendation: don't do it. Directories, AFAIK, are relatively cheap resource wise vs. lots of files in one. [Hopefully these files are large. Otherwise they should be joined together... if not, you're going to take a performance hit processing them *and* storing them...]