Latha
Sun, 05 Oct 2008 10:36:12 -0700
Greetings! Hi , When I load a directory(from hdfs) into an alias and try to dump it, I find all the lines of various files in that directory appearing one after another. However, not able to figure out how to access filenames from alias. Tried understanding script1-hadoop.pig. Still ,am not able to find out the same. A = load "inputDir" using PigStorage(); dump A; Output: ------------------------------------------------ ( line1 from inputDir/insideDir/file1.txt) ( line 2 from inputDir/insideDir/file1.txt) . (line 1 from inputDir/insideDir/innermost/fileone.txt) ... etc., ------------------------------------------------ Am interested in filewise results , where I can retain the filename and get the results filewise. filename1 ( line1 ) ( line2 ) filename2 (line 1) (line 2) etc., Is there any way I can access filenames from alias to which a directory is loaded? Requirement is to iterate through all the files, and in each file, would like to process every line. please point me the right approach. Regards, Srilatha