On Fri, Sep 2, 2011 at 4:04 PM, Meng Mao <[email protected]> wrote: > We have a compression utility that tries to grab all subdirs to a directory > on HDFS. It makes a call like this: > FileStatus[] subdirs = fs.globStatus(new Path(inputdir, "*")); > > and handles files vs dirs accordingly. > > We tried to run our utility against a dir containing a computed SOLR shard, > which has files that look like this: > -rw-r--r-- 2 hadoopuser visible 8538430603 2011-09-01 18:58 > /test/output/solr-20110901165238/part-00000/data/index/_ox.fdt > -rw-r--r-- 2 hadoopuser visible 233396596 2011-09-01 18:57 > /test/output/solr-20110901165238/part-00000/data/index/_ox.fdx > -rw-r--r-- 2 hadoopuser visible 130 2011-09-01 18:57 > /test/output/solr-20110901165238/part-00000/data/index/_ox.fnm > -rw-r--r-- 2 hadoopuser visible 2147948283 2011-09-01 18:55 > /test/output/solr-20110901165238/part-00000/data/index/_ox.frq > -rw-r--r-- 2 hadoopuser visible 87523726 2011-09-01 18:57 > /test/output/solr-20110901165238/part-00000/data/index/_ox.nrm > -rw-r--r-- 2 hadoopuser visible 920936168 2011-09-01 18:57 > /test/output/solr-20110901165238/part-00000/data/index/_ox.prx > -rw-r--r-- 2 hadoopuser visible 22619542 2011-09-01 18:58 > /test/output/solr-20110901165238/part-00000/data/index/_ox.tii > -rw-r--r-- 2 hadoopuser visible 2070214402 2011-09-01 18:51 > /test/output/solr-20110901165238/part-00000/data/index/_ox.tis > -rw-r--r-- 2 hadoopuser visible 20 2011-09-01 18:51 > /test/output/solr-20110901165238/part-00000/data/index/segments.gen > -rw-r--r-- 2 hadoopuser visible 282 2011-09-01 18:55 > /test/output/solr-20110901165238/part-00000/data/index/segments_2 > > > The globStatus call seems only able to pick up those last 2 files; the > several files that start with _ don't register. > > I've skimmed the FileSystem and GlobExpander source to see if there's > anything related to this, but didn't see it. Google didn't turn up anything > about underscores. Am I misunderstanding something about the regex patterns > needed to pick these up or unaware of some filename convention in HDFS? >
Files starting with '_' are considered 'hidden' like unix files starting with '.'. I did not know that for a very long time because not everyone follows this rule or even knows about it.
