Hello,
I have an MR job that talks to HBase. I use Gora to talk to HBase. Gora
also provides couple of classes which can be extended to write Mappers and
Reducers, if the mappers need input from an HBase store and Reducers need
to write it out to an HBase store. This is the reason why I use Gora.

Now, when I run my MR job, I get an exception as below. (
https://issues.apache.org/jira/browse/HADOOP-3093)
*
java.lang.RuntimeException: java.io.IOException:
java.lang.NullPointerException
at
org.apache.gora.mapreduce.GoraInputFormat.setConf(GoraInputFormat.java:115)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:723)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.io.IOException: java.lang.NullPointerException
at org.apache.gora.util.IOUtils.loadFromConf(IOUtils.java:483)
at
org.apache.gora.mapreduce.GoraInputFormat.getQuery(GoraInputFormat.java:125)
at
org.apache.gora.mapreduce.GoraInputFormat.setConf(GoraInputFormat.java:112)
... 9 more
Caused by: java.lang.NullPointerException
at
org.apache.hadoop.io.serializer.SerializationFactory.getDeserializer(SerializationFactory.java:77)
at org.apache.gora.util.IOUtils.deserialize(IOUtils.java:205)
at org.apache.gora.query.impl.QueryBase.readFields(QueryBase.java:234)
at
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
at
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
at
org.apache.hadoop.io.DefaultStringifier.fromString(DefaultStringifier.java:75)
at org.apache.hadoop.io.DefaultStringifier.load(DefaultStringifier.java:133)
at org.apache.gora.util.IOUtils.loadFromConf(IOUtils.java:480)
... 11 more

*
I tried the following things to work through this issue.
0. The stack trace indicates that, when setting up a new Mapper, it is
unable to deserialize something. (I could not get to understand where it
fails).
1. I looked around the forums and realized that serialization options are
not getting passed, so, I tried setting up, *io.serializations* config on
the job.
   1.1. I am not setting up the "io.serializations" myself, I use
GoraMapReduceUtils.setIOSerializations() to do it. I verified that, the
confs are getting proper serializers.
2. I verified in the job xml to see if these confs have got through, they
were. But, it failed again.
3. I tried starting the hadoop job runner with debug options turned on and
in suspend mode, -XDebug suspend=y and I also set the VM options for mapred
child tasks, via the mapred.child.java.*opts *to see if I can debug the VM
that gets spawned newly. Although I get a message on my stdout saying,
opening port X and waiting, when I try to attach a remote debugger on that
port, it does not work.

I understand that, when SerializationFactory tries to deSerialize
'something', it does not find an appropriate unmarshaller and so it fails.
But, I would like to know a way to find that 'something' and I would like
to get some idea on how (pseudo) distributed MR jobs should be generally
debugged. I tried searching, did not find anything useful.

Any help/pointers would be greatly useful.

Thanks!

-- 
It's just about how deep your longing is!

Reply via email to