Latha
Mon, 06 Oct 2008 11:52:42 -0700
Hi Olga, How can I achieve loading individual files from a directory structure at grunt shell? "bin/hadoop dfs -lsr" lists all the files in a hdfs irrespective of the depth of the file in directories. [it also lists directories :( ] However, PIG grunt shell supports dfs "ls" command , and not "lsr" command.Here, its not possible to get all the filenames. It lists only the toplevel directories or files available at hdfs. Please correct me if wrong. Rgds, Srilatha On Mon, Oct 6, 2008 at 9:11 PM, Olga Natkovich <[EMAIL PROTECTED]> wrote: > Metadata like filename is not preserved when the data is loaded. You can > load individual files and then use union command but that will run > slower because of extra processing steps. > > Olga > > > -----Original Message----- > > From: Latha [EMAIL PROTECTED] > > Sent: Sunday, October 05, 2008 10:35 AM > > To: pig-user@incubator.apache.org > > Subject: How to access filenames after loading a directory to > > an Alias [pig scripting] > > > > Greetings! > > Hi , When I load a directory(from hdfs) into an alias and > > try to dump it, I find all the lines of various files in that > > directory appearing one after another. > > However, not able to figure out how to access filenames from > > alias. Tried understanding script1-hadoop.pig. Still ,am not > > able to find out the same. > > > > A = load "inputDir" using PigStorage(); > > dump A; > > Output: > > ------------------------------------------------ > > ( line1 from inputDir/insideDir/file1.txt) ( line 2 from > > inputDir/insideDir/file1.txt) . > > (line 1 from inputDir/insideDir/innermost/fileone.txt) > > ... > > etc., > > ------------------------------------------------ > > > > Am interested in filewise results , where I can retain the > > filename and get the results filewise. > > > > filename1 > > ( line1 ) > > ( line2 ) > > > > filename2 > > (line 1) > > (line 2) > > etc., > > > > Is there any way I can access filenames from alias to which a > > directory is loaded? Requirement is to iterate through all > > the files, and in each file, would like to process every > > line. please point me the right approach. > > > > Regards, > > Srilatha > > >