Hard and practical limits for # of files in a hadoop directory.

Dmitriy Lyubimov Thu, 19 May 2011 14:51:32 -0700

Hi,

I have  a question about practical limit on number of files per hdfs
directory. (what's the hard limit btw?)


What is a practical limit on a # of files in a hadoop directory so
that glob selection still works efficiently (by efficiently i mean
under 30 seconds?)

We have something like daily log registry where every day is
represented by n files. then we use globs for defining inputs for MR
jobs that run over certain period of time. (Usually jobs do not select
more than just a few days).

So, do you have any heuristics as for after what number of files such
approach would become a problem?

Thank you very much in advance.

-Dmitriy

Hard and practical limits for # of files in a hadoop directory.

Reply via email to