Thanks Daniel. Please correct me if I have understood this incorrectly, but according to the documentation at http://hadoop.apache.org/docs/r2.7.3/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html#Skipping_Bad_Records , it seemed like the sole purpose of this functionality is to tolerate unknown failures/exceptions in mappers/reducers. If I was able to catch all failures, I do not need to even use this ability - is that not true?
If I have understood it incorrectly, when would one use the feature to skip bad records? Regards, PW On Thu, Apr 13, 2017 at 2:49 PM, Daniel Templeton <[email protected]> wrote: > You have to modify wordcount-mapper-t1.py to just ignore the bad line. In > the worst case, you should be able to do something like: > > for line in sys.stdin: > try: > # Insert processing code here > except: > # Error processing record, ignore it > pass > > Daniel > > > On 4/13/17 1:33 PM, Pillis W wrote: > >> Hello, >> I am using 'hadoop-streaming.jar' to do a simple word count, and want to >> skip records that fail execution. Below is the actual command I run, and >> the mapper always fails on one record, and hence fails the job. The input >> file is 3 lines with 1 bad line. >> >> hadoop jar /usr/lib/hadoop/hadoop-streaming.jar -D mapred.job.name >> =SkipTest >> -Dmapreduce.task.skip.start.attempts=1 -Dmapreduce.map.skip.maxrecords=1 >> -Dmapreduce.reduce.skip.maxgroups=1 >> -Dmapreduce.map.skip.proc.count.autoincr=false >> -Dmapreduce.reduce.skip.proc.count.autoincr=false -D >> mapred.reduce.tasks=1 >> -D mapred.map.tasks=1 -files >> /home/hadoop/wc/wordcount-mapper-t1.py,/home/hadoop/wc/wordc >> ount-reducer-t1.py >> -input /user/hadoop/data/test1 -output /user/hadoop/data/output-test-5 >> -mapper "python wordcount-mapper-t1.py" -reducer "python >> wordcount-reducer-t1.py" >> >> >> I was wondering if skipping of records is supported when MapReduce is used >> in streaming mode? >> >> Thanks in advance. >> PW >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
