Olga Natkovich
Mon, 06 Oct 2008 08:43:10 -0700
Metadata like filename is not preserved when the data is loaded. You can load individual files and then use union command but that will run slower because of extra processing steps. Olga > -----Original Message----- > From: Latha [EMAIL PROTECTED] > Sent: Sunday, October 05, 2008 10:35 AM > To: pig-user@incubator.apache.org > Subject: How to access filenames after loading a directory to > an Alias [pig scripting] > > Greetings! > Hi , When I load a directory(from hdfs) into an alias and > try to dump it, I find all the lines of various files in that > directory appearing one after another. > However, not able to figure out how to access filenames from > alias. Tried understanding script1-hadoop.pig. Still ,am not > able to find out the same. > > A = load "inputDir" using PigStorage(); > dump A; > Output: > ------------------------------------------------ > ( line1 from inputDir/insideDir/file1.txt) ( line 2 from > inputDir/insideDir/file1.txt) . > (line 1 from inputDir/insideDir/innermost/fileone.txt) > ... > etc., > ------------------------------------------------ > > Am interested in filewise results , where I can retain the > filename and get the results filewise. > > filename1 > ( line1 ) > ( line2 ) > > filename2 > (line 1) > (line 2) > etc., > > Is there any way I can access filenames from alias to which a > directory is loaded? Requirement is to iterate through all > the files, and in each file, would like to process every > line. please point me the right approach. > > Regards, > Srilatha >