We have a compression utility that tries to grab all subdirs to a directory
on HDFS. It makes a call like this:
FileStatus[] subdirs = fs.globStatus(new Path(inputdir, "*"));

and handles files vs dirs accordingly.

We tried to run our utility against a dir containing a computed SOLR shard,
which has files that look like this:
-rw-r--r--   2 hadoopuser visible 8538430603 2011-09-01 18:58
/test/output/solr-20110901165238/part-00000/data/index/_ox.fdt
-rw-r--r--   2 hadoopuser visible  233396596 2011-09-01 18:57
/test/output/solr-20110901165238/part-00000/data/index/_ox.fdx
-rw-r--r--   2 hadoopuser visible        130 2011-09-01 18:57
/test/output/solr-20110901165238/part-00000/data/index/_ox.fnm
-rw-r--r--   2 hadoopuser visible 2147948283 2011-09-01 18:55
/test/output/solr-20110901165238/part-00000/data/index/_ox.frq
-rw-r--r--   2 hadoopuser visible   87523726 2011-09-01 18:57
/test/output/solr-20110901165238/part-00000/data/index/_ox.nrm
-rw-r--r--   2 hadoopuser visible  920936168 2011-09-01 18:57
/test/output/solr-20110901165238/part-00000/data/index/_ox.prx
-rw-r--r--   2 hadoopuser visible   22619542 2011-09-01 18:58
/test/output/solr-20110901165238/part-00000/data/index/_ox.tii
-rw-r--r--   2 hadoopuser visible 2070214402 2011-09-01 18:51
/test/output/solr-20110901165238/part-00000/data/index/_ox.tis
-rw-r--r--   2 hadoopuser visible         20 2011-09-01 18:51
/test/output/solr-20110901165238/part-00000/data/index/segments.gen
-rw-r--r--   2 hadoopuser visible        282 2011-09-01 18:55
/test/output/solr-20110901165238/part-00000/data/index/segments_2


The globStatus call seems only able to pick up those last 2 files; the
several files that start with _ don't register.

I've skimmed the FileSystem and GlobExpander source to see if there's
anything related to this, but didn't see it. Google didn't turn up anything
about underscores. Am I misunderstanding something about the regex patterns
needed to pick these up or unaware of some filename convention in HDFS?

Reply via email to