If the 3rd party library is used as part of your Map() function, you could just 
catch the appropriate Exceptions, and simply not emit that record and return 
from the Map() normally.

--Aaron
-----Original Message-----
From: Maheshwaran Janarthanan [mailto:[email protected]]
Sent: Tuesday, August 09, 2011 10:28 AM
To: HADOOP USERGROUP
Subject: Skipping Bad Records in M/R Job


Hi,

I have written a Map reduce job which uses third party libraries to process 
unseen data which makes job fail because of errors in records.

I realized 'Skipping Bad Records' feature in Hadoop Map/Reduce. Can Anyone send 
me the code snippet which enables this feature by setting properties on JobConf

Thanks,
Ashwin!



> Date: Sun, 7 Aug 2011 01:11:29 +0530
> From: [email protected]
> Subject: Help on DFSClient
> To: [email protected]; [email protected]
>
> I am keeping a Stream Open and writing through it using a multithreaded 
> application.
> The application is in a different box and I am connecting to NN remotely.
>
> I was using FileSystem and getting same error and now I am trying DFSClient 
> and getting the same error.
>
> When I am running it via simple StandAlone class, it is not throwing any 
> error but when i put that in my Application, it is throwing this error.
>
> Please help me with this.
>
> Regards,
> JD
>
>
>  public String toString() {
>       String s = getClass().getSimpleName();
>       if (LOG.isTraceEnabled()) {
>         return s + "@" + DFSClient.this + ": "
>                + StringUtils.stringifyException(new Throwable("for testing"));
>       }
>       return s;
>     }
>
> My Stack Trace :::
>
>
> 06Aug2011 12:29:24,345 DEBUG [listenerContainer-1] (DFSClient.java:1115) - 
> Wait for lease checker to terminate
> 06Aug2011 12:29:24,346 DEBUG 
> [LeaseChecker@DFSClient[clientName=DFSClient_280246853, ugi=jagarandas]: 
> java.lang.Throwable: for testing
> at org.apache.hadoop.hdfs.DFSClient$LeaseChecker.toString(DFSClient.java:1181)
> at org.apache.hadoop.util.Daemon.<init>(Daemon.java:38)
> at org.apache.hadoop.hdfs.DFSClient$LeaseChecker.put(DFSClient.java:1094)
> at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:547)
> at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:513)
> at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:497)
> at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:442)
> at 
> com.apple.ireporter.common.persistence.ConnectionManager.createConnection(ConnectionManager.java:74)
> at 
> com.apple.ireporter.common.persistence.HDPPersistor.writeToHDP(HDPPersistor.java:95)
> at 
> com.apple.ireporter.datatransformer.translator.HDFSTranslator.persistData(HDFSTranslator.java:41)
> at 
> com.apple.ireporter.datatransformer.adapter.TranslatorAdapter.processData(TranslatorAdapter.java:61)
> at 
> com.apple.ireporter.datatransformer.DefaultMessageListener.persistValidatedData(DefaultMessageListener.java:276)
> at 
> com.apple.ireporter.datatransformer.DefaultMessageListener.onMessage(DefaultMessageListener.java:93)
> at 
> org.springframework.jms.listener.AbstractMessageListenerContainer.doInvokeListener(AbstractMessageListenerContainer.java:506)
> at 
> org.springframework.jms.listener.AbstractMessageListenerContainer.invokeListener(AbstractMessageListenerContainer.java:463)
> at 
> org.springframework.jms.listener.AbstractMessageListenerContainer.doExecuteListener(AbstractMessageListenerContainer.java:435)
> at 
> org.springframework.jms.listener.AbstractPollingMessageListenerContainer.doReceiveAndExecute(AbstractPollingMessageListenerContainer.java:322)
> at 
> org.springframework.jms.listener.AbstractPollingMessageListenerContainer.receiveAndExecute(AbstractPollingMessageListenerContainer.java:260)
> at 
> org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.invokeListener(DefaultMessageListenerContainer.java:944)
> at 
> org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.run(DefaultMessageListenerContainer.java:868)
> at java.lang.Thread.run(Thread.java:680)

Reply via email to