Hi,

I have  a question about practical limit on number of files per hdfs
directory. (what's the hard limit btw?)

What is a practical limit on a # of files in a hadoop directory so
that glob selection still works efficiently (by efficiently i mean
under 30 seconds?)

We have something like daily log registry where every day is
represented by n files. then we use globs for defining inputs for MR
jobs that run over certain period of time. (Usually jobs do not select
more than just a few days).

So, do you have any heuristics as for after what number of files such
approach would become a problem?

Thank you very much in advance.

-Dmitriy

Reply via email to