Hello, I have an MR job that talks to HBase. I use Gora to talk to HBase. Gora also provides couple of classes which can be extended to write Mappers and Reducers, if the mappers need input from an HBase store and Reducers need to write it out to an HBase store. This is the reason why I use Gora.
Now, when I run my MR job, I get an exception as below. ( https://issues.apache.org/jira/browse/HADOOP-3093) * java.lang.RuntimeException: java.io.IOException: java.lang.NullPointerException at org.apache.gora.mapreduce.GoraInputFormat.setConf(GoraInputFormat.java:115) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:723) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: java.io.IOException: java.lang.NullPointerException at org.apache.gora.util.IOUtils.loadFromConf(IOUtils.java:483) at org.apache.gora.mapreduce.GoraInputFormat.getQuery(GoraInputFormat.java:125) at org.apache.gora.mapreduce.GoraInputFormat.setConf(GoraInputFormat.java:112) ... 9 more Caused by: java.lang.NullPointerException at org.apache.hadoop.io.serializer.SerializationFactory.getDeserializer(SerializationFactory.java:77) at org.apache.gora.util.IOUtils.deserialize(IOUtils.java:205) at org.apache.gora.query.impl.QueryBase.readFields(QueryBase.java:234) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40) at org.apache.hadoop.io.DefaultStringifier.fromString(DefaultStringifier.java:75) at org.apache.hadoop.io.DefaultStringifier.load(DefaultStringifier.java:133) at org.apache.gora.util.IOUtils.loadFromConf(IOUtils.java:480) ... 11 more * I tried the following things to work through this issue. 0. The stack trace indicates that, when setting up a new Mapper, it is unable to deserialize something. (I could not get to understand where it fails). 1. I looked around the forums and realized that serialization options are not getting passed, so, I tried setting up, *io.serializations* config on the job. 1.1. I am not setting up the "io.serializations" myself, I use GoraMapReduceUtils.setIOSerializations() to do it. I verified that, the confs are getting proper serializers. 2. I verified in the job xml to see if these confs have got through, they were. But, it failed again. 3. I tried starting the hadoop job runner with debug options turned on and in suspend mode, -XDebug suspend=y and I also set the VM options for mapred child tasks, via the mapred.child.java.*opts *to see if I can debug the VM that gets spawned newly. Although I get a message on my stdout saying, opening port X and waiting, when I try to attach a remote debugger on that port, it does not work. I understand that, when SerializationFactory tries to deSerialize 'something', it does not find an appropriate unmarshaller and so it fails. But, I would like to know a way to find that 'something' and I would like to get some idea on how (pseudo) distributed MR jobs should be generally debugged. I tried searching, did not find anything useful. Any help/pointers would be greatly useful. Thanks! -- It's just about how deep your longing is!