okay. But this issue didn't present itself when run in standalone mode. :) On 28 Jul 2012 06:02, "Harsh J" <ha...@cloudera.com> wrote:
> I find it easier to run jobs via MRUnit (http://mrunit.apache.org, > TDD) first, or via LocalJobRunner, for debug purposes. > > On Sat, Jul 28, 2012 at 5:53 AM, Sriram Ramachandrasekaran > <sri.ram...@gmail.com> wrote: > > hello harsh, > > thanks for your investigations. while we were debugging, I saw the exact > > thing. As you pointed out, we suspected it to be a problem. So, we set > the > > job conf object directly on Gora's query object. > > It goes something like this, > > query.setConf..(job.getConfig..()) > > > > And, then I saw that it was not getting into creating a new object at > > getOrCreate(). > > > > OTOH, i've not tried the job.xml thing. I should give it a try n I shall > > keep the loop posted. > > > > I would also like to hear about standard practices for debugging > distributed > > MR tasks. > > > > ----- > > reply from a hh device. Pl excuse typos n lack of formatting. > > > > On 28 Jul 2012 03:30, "Harsh J" <ha...@cloudera.com> wrote: > >> > >> Hi Sriram, > >> > >> I suspect the following in Gora to somehow be causing this issue: > >> > >> IOUtils source: > >> > >> > http://svn.apache.org/viewvc/gora/trunk/gora-core/src/main/java/org/apache/gora/util/IOUtils.java?view=markup > >> QueryBase source: > >> > >> > http://svn.apache.org/viewvc/gora/trunk/gora-core/src/main/java/org/apache/gora/query/impl/QueryBase.java?view=markup > >> > >> Notice that IOUtils.deserialize(…) calls expect a proper Configuration > >> object. If not passed (i.e., if null), they call the following. > >> > >> 68 private static Configuration getOrCreateConf(Configuration > conf) > >> { > >> 69 if(conf == null) { > >> 70 if(IOUtils.conf == null) { > >> 71 IOUtils.conf = new Configuration(); > >> 72 } > >> 73 } > >> 74 return conf != null ? conf : IOUtils.conf; > >> 75 } > >> > >> Now QueryBase, has in its readFields method, some > >> IOUtils.deserialize(…) calls, that seem to pass a null for the > >> configuration object. The IOUtils.deserialize(…) method hence calls > >> this above method, and initializes a whole new Configuration object, > >> as the passed conf object is null. > >> > >> If it does that, it would not be loading the "job.xml" file contents, > >> which is the job's config file (thats something the map task's config > >> set alone loads, and not a file thats loaded by default). So hence, > >> custom serializers will disappear the moment it begins using this new > >> Configuration object. > >> > >> This is what you'll want to investigate and fix or notify the Gora > >> devs about (why QueryBase#readFields uses a null object, and if it can > >> reuse some set conf object). As a cheap hack fix, maybe doing the > >> following will make it work in an MR environment? > >> > >> IOUtils.conf = new Configuration(); > >> IOUtils.conf.addResource("job.xml"); > >> > >> I haven't tried the above, but let us know how we can be of further > >> assistance. An ideal fix would be to only use the MapTask's provided > >> Configuration object everywhere, somehow, and never re-create one. > >> > >> P.s. If you want a thread ref link to share with other devs over Gora, > >> here it is: http://search-hadoop.com/m/BXZA4dTUFC > >> > >> On Fri, Jul 27, 2012 at 1:24 PM, Sriram Ramachandrasekaran > >> <sri.ram...@gmail.com> wrote: > >> > Hello, > >> > I have an MR job that talks to HBase. I use Gora to talk to HBase. > Gora > >> > also > >> > provides couple of classes which can be extended to write Mappers and > >> > Reducers, if the mappers need input from an HBase store and Reducers > >> > need to > >> > write it out to an HBase store. This is the reason why I use Gora. > >> > > >> > Now, when I run my MR job, I get an exception as below. > >> > (https://issues.apache.org/jira/browse/HADOOP-3093) > >> > java.lang.RuntimeException: java.io.IOException: > >> > java.lang.NullPointerException > >> > at > >> > > >> > > org.apache.gora.mapreduce.GoraInputFormat.setConf(GoraInputFormat.java:115) > >> > at > >> > > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62) > >> > at > >> > > >> > > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) > >> > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:723) > >> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > >> > at org.apache.hadoop.mapred.Child$4.run(Child.java:255) > >> > at java.security.AccessController.doPrivileged(Native Method) > >> > at javax.security.auth.Subject.doAs(Subject.java:415) > >> > at > >> > > >> > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093) > >> > at org.apache.hadoop.mapred.Child.main(Child.java:249) > >> > Caused by: java.io.IOException: java.lang.NullPointerException > >> > at org.apache.gora.util.IOUtils.loadFromConf(IOUtils.java:483) > >> > at > >> > > >> > > org.apache.gora.mapreduce.GoraInputFormat.getQuery(GoraInputFormat.java:125) > >> > at > >> > > >> > > org.apache.gora.mapreduce.GoraInputFormat.setConf(GoraInputFormat.java:112) > >> > ... 9 more > >> > Caused by: java.lang.NullPointerException > >> > at > >> > > >> > > org.apache.hadoop.io.serializer.SerializationFactory.getDeserializer(SerializationFactory.java:77) > >> > at org.apache.gora.util.IOUtils.deserialize(IOUtils.java:205) > >> > at org.apache.gora.query.impl.QueryBase.readFields(QueryBase.java:234) > >> > at > >> > > >> > > org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67) > >> > at > >> > > >> > > org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40) > >> > at > >> > > >> > > org.apache.hadoop.io.DefaultStringifier.fromString(DefaultStringifier.java:75) > >> > at > >> > > org.apache.hadoop.io.DefaultStringifier.load(DefaultStringifier.java:133) > >> > at org.apache.gora.util.IOUtils.loadFromConf(IOUtils.java:480) > >> > ... 11 more > >> > > >> > I tried the following things to work through this issue. > >> > 0. The stack trace indicates that, when setting up a new Mapper, it is > >> > unable to deserialize something. (I could not get to understand where > it > >> > fails). > >> > 1. I looked around the forums and realized that serialization options > >> > are > >> > not getting passed, so, I tried setting up, io.serializations config > on > >> > the > >> > job. > >> > 1.1. I am not setting up the "io.serializations" myself, I use > >> > GoraMapReduceUtils.setIOSerializations() to do it. I verified that, > the > >> > confs are getting proper serializers. > >> > 2. I verified in the job xml to see if these confs have got through, > >> > they > >> > were. But, it failed again. > >> > 3. I tried starting the hadoop job runner with debug options turned on > >> > and > >> > in suspend mode, -XDebug suspend=y and I also set the VM options for > >> > mapred > >> > child tasks, via the mapred.child.java.opts to see if I can debug the > VM > >> > that gets spawned newly. Although I get a message on my stdout saying, > >> > opening port X and waiting, when I try to attach a remote debugger on > >> > that > >> > port, it does not work. > >> > > >> > I understand that, when SerializationFactory tries to deSerialize > >> > 'something', it does not find an appropriate unmarshaller and so it > >> > fails. > >> > But, I would like to know a way to find that 'something' and I would > >> > like to > >> > get some idea on how (pseudo) distributed MR jobs should be generally > >> > debugged. I tried searching, did not find anything useful. > >> > > >> > Any help/pointers would be greatly useful. > >> > > >> > Thanks! > >> > > >> > -- > >> > It's just about how deep your longing is! > >> > > >> > >> > >> > >> -- > >> Harsh J > > > > -- > Harsh J >