hello harsh,
thanks for your investigations. while we were debugging, I saw the exact
thing. As you pointed out, we suspected it to be a problem. So, we set the
job conf object directly on Gora's query object.
It goes something like this,
query.setConf..(job.getConfig..())

And, then I saw that it was not getting into creating a new object at
getOrCreate().

OTOH, i've not tried the job.xml thing. I should give it a try n I shall
keep the loop posted.

I would also like to hear about standard practices for debugging
distributed MR tasks.

-----
reply from a hh device. Pl excuse typos n lack of formatting.
On 28 Jul 2012 03:30, "Harsh J" <ha...@cloudera.com> wrote:

> Hi Sriram,
>
> I suspect the following in Gora to somehow be causing this issue:
>
> IOUtils source:
>
> http://svn.apache.org/viewvc/gora/trunk/gora-core/src/main/java/org/apache/gora/util/IOUtils.java?view=markup
> QueryBase source:
>
> http://svn.apache.org/viewvc/gora/trunk/gora-core/src/main/java/org/apache/gora/query/impl/QueryBase.java?view=markup
>
> Notice that IOUtils.deserialize(…) calls expect a proper Configuration
> object. If not passed (i.e., if null), they call the following.
>
> 68        private static Configuration getOrCreateConf(Configuration conf)
> {
> 69          if(conf == null) {
> 70            if(IOUtils.conf == null) {
> 71              IOUtils.conf = new Configuration();
> 72            }
> 73          }
> 74          return conf != null ? conf : IOUtils.conf;
> 75        }
>
> Now QueryBase, has in its readFields method, some
> IOUtils.deserialize(…) calls, that seem to pass a null for the
> configuration object. The IOUtils.deserialize(…) method hence calls
> this above method, and initializes a whole new Configuration object,
> as the passed conf object is null.
>
> If it does that, it would not be loading the "job.xml" file contents,
> which is the job's config file (thats something the map task's config
> set alone loads, and not a file thats loaded by default). So hence,
> custom serializers will disappear the moment it begins using this new
> Configuration object.
>
> This is what you'll want to investigate and fix or notify the Gora
> devs about (why QueryBase#readFields uses a null object, and if it can
> reuse some set conf object). As a cheap hack fix, maybe doing the
> following will make it work in an MR environment?
>
> IOUtils.conf = new Configuration();
> IOUtils.conf.addResource("job.xml");
>
> I haven't tried the above, but let us know how we can be of further
> assistance. An ideal fix would be to only use the MapTask's provided
> Configuration object everywhere, somehow, and never re-create one.
>
> P.s. If you want a thread ref link to share with other devs over Gora,
> here it is: http://search-hadoop.com/m/BXZA4dTUFC
>
> On Fri, Jul 27, 2012 at 1:24 PM, Sriram Ramachandrasekaran
> <sri.ram...@gmail.com> wrote:
> > Hello,
> > I have an MR job that talks to HBase. I use Gora to talk to HBase. Gora
> also
> > provides couple of classes which can be extended to write Mappers and
> > Reducers, if the mappers need input from an HBase store and Reducers
> need to
> > write it out to an HBase store. This is the reason why I use Gora.
> >
> > Now, when I run my MR job, I get an exception as below.
> > (https://issues.apache.org/jira/browse/HADOOP-3093)
> > java.lang.RuntimeException: java.io.IOException:
> > java.lang.NullPointerException
> > at
> >
> org.apache.gora.mapreduce.GoraInputFormat.setConf(GoraInputFormat.java:115)
> > at
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
> > at
> >
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
> > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:723)
> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> > at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at javax.security.auth.Subject.doAs(Subject.java:415)
> > at
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
> > at org.apache.hadoop.mapred.Child.main(Child.java:249)
> > Caused by: java.io.IOException: java.lang.NullPointerException
> > at org.apache.gora.util.IOUtils.loadFromConf(IOUtils.java:483)
> > at
> >
> org.apache.gora.mapreduce.GoraInputFormat.getQuery(GoraInputFormat.java:125)
> > at
> >
> org.apache.gora.mapreduce.GoraInputFormat.setConf(GoraInputFormat.java:112)
> > ... 9 more
> > Caused by: java.lang.NullPointerException
> > at
> >
> org.apache.hadoop.io.serializer.SerializationFactory.getDeserializer(SerializationFactory.java:77)
> > at org.apache.gora.util.IOUtils.deserialize(IOUtils.java:205)
> > at org.apache.gora.query.impl.QueryBase.readFields(QueryBase.java:234)
> > at
> >
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
> > at
> >
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
> > at
> >
> org.apache.hadoop.io.DefaultStringifier.fromString(DefaultStringifier.java:75)
> > at
> org.apache.hadoop.io.DefaultStringifier.load(DefaultStringifier.java:133)
> > at org.apache.gora.util.IOUtils.loadFromConf(IOUtils.java:480)
> > ... 11 more
> >
> > I tried the following things to work through this issue.
> > 0. The stack trace indicates that, when setting up a new Mapper, it is
> > unable to deserialize something. (I could not get to understand where it
> > fails).
> > 1. I looked around the forums and realized that serialization options are
> > not getting passed, so, I tried setting up, io.serializations config on
> the
> > job.
> >    1.1. I am not setting up the "io.serializations" myself, I use
> > GoraMapReduceUtils.setIOSerializations() to do it. I verified that, the
> > confs are getting proper serializers.
> > 2. I verified in the job xml to see if these confs have got through, they
> > were. But, it failed again.
> > 3. I tried starting the hadoop job runner with debug options turned on
> and
> > in suspend mode, -XDebug suspend=y and I also set the VM options for
> mapred
> > child tasks, via the mapred.child.java.opts to see if I can debug the VM
> > that gets spawned newly. Although I get a message on my stdout saying,
> > opening port X and waiting, when I try to attach a remote debugger on
> that
> > port, it does not work.
> >
> > I understand that, when SerializationFactory tries to deSerialize
> > 'something', it does not find an appropriate unmarshaller and so it
> fails.
> > But, I would like to know a way to find that 'something' and I would
> like to
> > get some idea on how (pseudo) distributed MR jobs should be generally
> > debugged. I tried searching, did not find anything useful.
> >
> > Any help/pointers would be greatly useful.
> >
> > Thanks!
> >
> > --
> > It's just about how deep your longing is!
> >
>
>
>
> --
> Harsh J
>

Reply via email to