Olga Natkovich
Mon, 06 Oct 2008 14:50:31 -0700
That is true. Pig currently does not support that. Olga > -----Original Message----- > From: Latha [EMAIL PROTECTED] > Sent: Monday, October 06, 2008 11:52 AM > To: pig-user@incubator.apache.org > Subject: Re: How to access filenames after loading a > directory to an Alias [pig scripting] > > Hi Olga, > > How can I achieve loading individual files from a directory > structure at grunt shell? > > "bin/hadoop dfs -lsr" lists all the files in a hdfs > irrespective of the depth of the file in directories. > [it also lists directories :( ] > > However, PIG grunt shell supports dfs "ls" command , and not "lsr" > command.Here, its not > possible to get all the filenames. It lists only the toplevel > directories or files available at hdfs. > Please correct me if wrong. > > Rgds, > Srilatha > > > On Mon, Oct 6, 2008 at 9:11 PM, Olga Natkovich > <[EMAIL PROTECTED]> wrote: > > > Metadata like filename is not preserved when the data is > loaded. You > > can load individual files and then use union command but > that will run > > slower because of extra processing steps. > > > > Olga > > > > > -----Original Message----- > > > From: Latha [EMAIL PROTECTED] > > > Sent: Sunday, October 05, 2008 10:35 AM > > > To: pig-user@incubator.apache.org > > > Subject: How to access filenames after loading a directory to an > > > Alias [pig scripting] > > > > > > Greetings! > > > Hi , When I load a directory(from hdfs) into an alias and try to > > > dump it, I find all the lines of various files in that directory > > > appearing one after another. > > > However, not able to figure out how to access filenames > from alias. > > > Tried understanding script1-hadoop.pig. Still ,am not > able to find > > > out the same. > > > > > > A = load "inputDir" using PigStorage(); dump A; > > > Output: > > > ------------------------------------------------ > > > ( line1 from inputDir/insideDir/file1.txt) ( line 2 from > > > inputDir/insideDir/file1.txt) . > > > (line 1 from inputDir/insideDir/innermost/fileone.txt) > > > ... > > > etc., > > > ------------------------------------------------ > > > > > > Am interested in filewise results , where I can retain > the filename > > > and get the results filewise. > > > > > > filename1 > > > ( line1 ) > > > ( line2 ) > > > > > > filename2 > > > (line 1) > > > (line 2) > > > etc., > > > > > > Is there any way I can access filenames from alias to which a > > > directory is loaded? Requirement is to iterate through all the > > > files, and in each file, would like to process every line. please > > > point me the right approach. > > > > > > Regards, > > > Srilatha > > > > > >