Meng, What version of hadoop are you on? I'm able to use globStatus(Path) for '_' listing successfully, with a '*' glob. Although the same doesn't apply to what FsShell's ls utility provide (which is odd here!).
Here's my test code which can validate that the listing is indeed done: http://pastebin.com/vCbd2wmK $ hadoop dfs -ls Found 4 items drwxr-xr-x - harshchouraria supergroup 0 2011-09-03 09:09 /user/harshchouraria/_abc -rw-r--r-- 1 harshchouraria supergroup 0 2011-09-03 09:10 /user/harshchouraria/_def drwxr-xr-x - harshchouraria supergroup 0 2011-09-03 08:10 /user/harshchouraria/abc -rw-r--r-- 1 harshchouraria supergroup 0 2011-09-03 09:10 /user/harshchouraria/def $ hadoop dfs -ls '*' -rw-r--r-- 1 harshchouraria supergroup 0 2011-09-03 09:10 /user/harshchouraria/_def -rw-r--r-- 1 harshchouraria supergroup 0 2011-09-03 09:10 /user/harshchouraria/def $ # No dir results! ^^ $ hadoop jar myjar.jar # (My code) hdfs://localhost/user/harshchouraria/_abc hdfs://localhost/user/harshchouraria/_def hdfs://localhost/user/harshchouraria/abc hdfs://localhost/user/harshchouraria/def I suppose that means globStatus is fine, but the FsShell.ls(…) code does something more than a simple glob status, and filters away directory results when used with a glob. On Sat, Sep 3, 2011 at 3:07 AM, Meng Mao <[email protected]> wrote: > Is there a programmatic way to access these hidden files then? > > On Fri, Sep 2, 2011 at 5:20 PM, Edward Capriolo <[email protected]>wrote: > >> On Fri, Sep 2, 2011 at 4:04 PM, Meng Mao <[email protected]> wrote: >> >> > We have a compression utility that tries to grab all subdirs to a >> directory >> > on HDFS. It makes a call like this: >> > FileStatus[] subdirs = fs.globStatus(new Path(inputdir, "*")); >> > >> > and handles files vs dirs accordingly. >> > >> > We tried to run our utility against a dir containing a computed SOLR >> shard, >> > which has files that look like this: >> > -rw-r--r-- 2 hadoopuser visible 8538430603 2011-09-01 18:58 >> > /test/output/solr-20110901165238/part-00000/data/index/_ox.fdt >> > -rw-r--r-- 2 hadoopuser visible 233396596 2011-09-01 18:57 >> > /test/output/solr-20110901165238/part-00000/data/index/_ox.fdx >> > -rw-r--r-- 2 hadoopuser visible 130 2011-09-01 18:57 >> > /test/output/solr-20110901165238/part-00000/data/index/_ox.fnm >> > -rw-r--r-- 2 hadoopuser visible 2147948283 2011-09-01 18:55 >> > /test/output/solr-20110901165238/part-00000/data/index/_ox.frq >> > -rw-r--r-- 2 hadoopuser visible 87523726 2011-09-01 18:57 >> > /test/output/solr-20110901165238/part-00000/data/index/_ox.nrm >> > -rw-r--r-- 2 hadoopuser visible 920936168 2011-09-01 18:57 >> > /test/output/solr-20110901165238/part-00000/data/index/_ox.prx >> > -rw-r--r-- 2 hadoopuser visible 22619542 2011-09-01 18:58 >> > /test/output/solr-20110901165238/part-00000/data/index/_ox.tii >> > -rw-r--r-- 2 hadoopuser visible 2070214402 2011-09-01 18:51 >> > /test/output/solr-20110901165238/part-00000/data/index/_ox.tis >> > -rw-r--r-- 2 hadoopuser visible 20 2011-09-01 18:51 >> > /test/output/solr-20110901165238/part-00000/data/index/segments.gen >> > -rw-r--r-- 2 hadoopuser visible 282 2011-09-01 18:55 >> > /test/output/solr-20110901165238/part-00000/data/index/segments_2 >> > >> > >> > The globStatus call seems only able to pick up those last 2 files; the >> > several files that start with _ don't register. >> > >> > I've skimmed the FileSystem and GlobExpander source to see if there's >> > anything related to this, but didn't see it. Google didn't turn up >> anything >> > about underscores. Am I misunderstanding something about the regex >> patterns >> > needed to pick these up or unaware of some filename convention in HDFS? >> > >> >> Files starting with '_' are considered 'hidden' like unix files starting >> with '.'. I did not know that for a very long time because not everyone >> follows this rule or even knows about it. >> > -- Harsh J
