Hi, I have a question about practical limit on number of files per hdfs directory. (what's the hard limit btw?)
What is a practical limit on a # of files in a hadoop directory so that glob selection still works efficiently (by efficiently i mean under 30 seconds?) We have something like daily log registry where every day is represented by n files. then we use globs for defining inputs for MR jobs that run over certain period of time. (Usually jobs do not select more than just a few days). So, do you have any heuristics as for after what number of files such approach would become a problem? Thank you very much in advance. -Dmitriy