Raghu Rajagopalan
Wed, 02 Jul 2008 13:05:31 -0700
Okay - tried some more tests - with a portion of the same input file: 1. with a 10 meg input file - 1 map task (based on dfs block size of 64 M) - Pig + Hadoop runs successfully. All Map tasks complete when they reach 100%. Normal, expected behavior 2. With a 300 Meg input file - 5 Map tasks - Ran successfully - BUT - many instances when map tasks wont complete on reaching 100% - however, let them continue for sometime and eventually they were marked complete. On the console, Pig output progress seems stuck for quite sometime and then eventually moves forward as the map tasks complete. Overall job execution took about 20 mins. I'm quite okay at looking at src to find out what gives but dont know where to start poking. Any help to get me off the ground would be great. thanks! Raghu On Tue, Jul 1, 2008 at 12:13 PM, Raghu Rajagopalan <[EMAIL PROTECTED]> wrote: > Hi, > I wrote a small pig script with a couple of functions and it works > fine in the local mode. > However, when I run it on a hadoop cluster on a 4Gig file (apache > access log). The job is submitted successfully, and the input is split > to 66 map tasks (64 mb chunk size). On my cluster of 10 machines, the > first 10 maps commence - however, they do not seem to terminate > (progress goes to 1200% on the Hadoop map red tasks). I dont see > anything untoward in teh logs either. > > On the command line, Pig's progress indicator sysouts continue indefinitely. > > Pig script and the referred functions are attached. I'm wondering if > anyone's seen anything similar and/or any steps needed to fix this. > > CsvLogStorage.java - Load function using opencsv to parse apache log > REGEX.java - regex splitter that outputs a tuple with a given regex > SPLITDATE.java - parse a date and output tuple with given date parts. > > My guess is that there's something wrong with the way the custom load > function is written. > > My setup: > Hadoop 0.17 > Pig.jar from the pigtutorial.tar.gz on the wiki. > > Thanks for looking. > Raghu >