I get the opposite behavior -- [this is more or less how I listed the files in the original email] hadoop dfs -ls /test/output/solr-20110901165238/part-00000/data/index/* -rw-r--r-- 2 hadoopuser visible 8538430603 2011-09-01 18:58 /test/output/solr-20110901165238/part-00000/data/index/_ox.fdt -rw-r--r-- 2 hadoopuser visible 233396596 2011-09-01 18:57 /test/output/solr-20110901165238/part-00000/data/index/_ox.fdx -rw-r--r-- 2 hadoopuser visible 130 2011-09-01 18:57 /test/output/solr-20110901165238/part-00000/data/index/_ox.fnm -rw-r--r-- 2 hadoopuser visible 2147948283 2011-09-01 18:55 /test/output/solr-20110901165238/part-00000/data/index/_ox.frq -rw-r--r-- 2 hadoopuser visible 87523726 2011-09-01 18:57 /test/output/solr-20110901165238/part-00000/data/index/_ox.nrm -rw-r--r-- 2 hadoopuser visible 920936168 2011-09-01 18:57 /test/output/solr-20110901165238/part-00000/data/index/_ox.prx -rw-r--r-- 2 hadoopuser visible 22619542 2011-09-01 18:58 /test/output/solr-20110901165238/part-00000/data/index/_ox.tii -rw-r--r-- 2 hadoopuser visible 2070214402 2011-09-01 18:51 /test/output/solr-20110901165238/part-00000/data/index/_ox.tis -rw-r--r-- 2 hadoopuser visible 20 2011-09-01 18:51 /test/output/solr-20110901165238/part-00000/data/index/segments.gen -rw-r--r-- 2 hadoopuser visible 282 2011-09-01 18:55 /test/output/solr-20110901165238/part-00000/data/index/segments_2
Whereas my globStatus doesn't capture them. I thought we were on Cloudera's CDH3, but now I'm not sure. This is what version reports: $ hadoop version Hadoop 0.20.1+169.56 Subversion -r 8e662cb065be1c4bc61c55e6bff161e09c1d36f3 Compiled by root on Tue Feb 9 13:40:08 EST 2010 On Fri, Sep 2, 2011 at 11:45 PM, Harsh J <[email protected]> wrote: > Meng, > > What version of hadoop are you on? I'm able to use globStatus(Path) > for '_' listing successfully, with a '*' glob. Although the same > doesn't apply to what FsShell's ls utility provide (which is odd > here!). > > Here's my test code which can validate that the listing is indeed > done: http://pastebin.com/vCbd2wmK > > $ hadoop dfs -ls > Found 4 items > drwxr-xr-x - harshchouraria supergroup 0 2011-09-03 09:09 > /user/harshchouraria/_abc > -rw-r--r-- 1 harshchouraria supergroup 0 2011-09-03 09:10 > /user/harshchouraria/_def > drwxr-xr-x - harshchouraria supergroup 0 2011-09-03 08:10 > /user/harshchouraria/abc > -rw-r--r-- 1 harshchouraria supergroup 0 2011-09-03 09:10 > /user/harshchouraria/def > > > $ hadoop dfs -ls '*' > -rw-r--r-- 1 harshchouraria supergroup 0 2011-09-03 09:10 > /user/harshchouraria/_def > -rw-r--r-- 1 harshchouraria supergroup 0 2011-09-03 09:10 > /user/harshchouraria/def > > $ # No dir results! ^^ > > $ hadoop jar myjar.jar # (My code) > hdfs://localhost/user/harshchouraria/_abc > hdfs://localhost/user/harshchouraria/_def > hdfs://localhost/user/harshchouraria/abc > hdfs://localhost/user/harshchouraria/def > > I suppose that means globStatus is fine, but the FsShell.ls(…) code > does something more than a simple glob status, and filters away > directory results when used with a glob. > > On Sat, Sep 3, 2011 at 3:07 AM, Meng Mao <[email protected]> wrote: > > Is there a programmatic way to access these hidden files then? > > > > On Fri, Sep 2, 2011 at 5:20 PM, Edward Capriolo <[email protected] > >wrote: > > > >> On Fri, Sep 2, 2011 at 4:04 PM, Meng Mao <[email protected]> wrote: > >> > >> > We have a compression utility that tries to grab all subdirs to a > >> directory > >> > on HDFS. It makes a call like this: > >> > FileStatus[] subdirs = fs.globStatus(new Path(inputdir, "*")); > >> > > >> > and handles files vs dirs accordingly. > >> > > >> > We tried to run our utility against a dir containing a computed SOLR > >> shard, > >> > which has files that look like this: > >> > -rw-r--r-- 2 hadoopuser visible 8538430603 2011-09-01 18:58 > >> > /test/output/solr-20110901165238/part-00000/data/index/_ox.fdt > >> > -rw-r--r-- 2 hadoopuser visible 233396596 2011-09-01 18:57 > >> > /test/output/solr-20110901165238/part-00000/data/index/_ox.fdx > >> > -rw-r--r-- 2 hadoopuser visible 130 2011-09-01 18:57 > >> > /test/output/solr-20110901165238/part-00000/data/index/_ox.fnm > >> > -rw-r--r-- 2 hadoopuser visible 2147948283 2011-09-01 18:55 > >> > /test/output/solr-20110901165238/part-00000/data/index/_ox.frq > >> > -rw-r--r-- 2 hadoopuser visible 87523726 2011-09-01 18:57 > >> > /test/output/solr-20110901165238/part-00000/data/index/_ox.nrm > >> > -rw-r--r-- 2 hadoopuser visible 920936168 2011-09-01 18:57 > >> > /test/output/solr-20110901165238/part-00000/data/index/_ox.prx > >> > -rw-r--r-- 2 hadoopuser visible 22619542 2011-09-01 18:58 > >> > /test/output/solr-20110901165238/part-00000/data/index/_ox.tii > >> > -rw-r--r-- 2 hadoopuser visible 2070214402 2011-09-01 18:51 > >> > /test/output/solr-20110901165238/part-00000/data/index/_ox.tis > >> > -rw-r--r-- 2 hadoopuser visible 20 2011-09-01 18:51 > >> > /test/output/solr-20110901165238/part-00000/data/index/segments.gen > >> > -rw-r--r-- 2 hadoopuser visible 282 2011-09-01 18:55 > >> > /test/output/solr-20110901165238/part-00000/data/index/segments_2 > >> > > >> > > >> > The globStatus call seems only able to pick up those last 2 files; the > >> > several files that start with _ don't register. > >> > > >> > I've skimmed the FileSystem and GlobExpander source to see if there's > >> > anything related to this, but didn't see it. Google didn't turn up > >> anything > >> > about underscores. Am I misunderstanding something about the regex > >> patterns > >> > needed to pick these up or unaware of some filename convention in > HDFS? > >> > > >> > >> Files starting with '_' are considered 'hidden' like unix files starting > >> with '.'. I did not know that for a very long time because not everyone > >> follows this rule or even knows about it. > >> > > > > > > -- > Harsh J >
