Re: FileSystem.listStatus() on S3

Doug Cutting Thu, 29 May 2008 11:45:03 -0700

This is discussed in:

https://issues.apache.org/jira/browse/HADOOP-3095


If this gets fixed in the next week it will make it into 0.18.

Doug

Kyle Sampson wrote:

We're using Hadoop 0.17 with S3 as the filesystem. We've created acustom InputFormat for our data. One of the things it needs to do is onInputFormat.getSplits() list all of the files and directories under acertain path, and there may be thousands of entries in there. It'susing FileSystem.listStatus() to get these paths. With S3, this isturning out to be extraordinarily slow with directories that contain onthe order of thousands of subdirectories and files.
Looking into it a bit, it seems listStatus() is making a call to S3 forevery subdirectory or file found to get extra file status information.It seems there used to be a listPaths() method that would just get thepaths, but that's been deprecated and removed. Is there any waycurrently to get just a list of paths without status information?
Kyle Sampson
[EMAIL PROTECTED]

Re: FileSystem.listStatus() on S3

Reply via email to