On 02/16/2012 10:15 AM, Harsh J wrote:
That is how HBase does it: HBaseConfiguration at driver loads up HBase
*xml file configs from driver classpath (or user set() entries, either
way), and then submits that as part of job.xml. These configs should
be all you need.

It should be, and yet I'm running into sporadic problems. The details are sort of separate from mapreduce proper, and I'm still not sure of the exact root cause (sporadic bugs are the worst), but it seems to come down to an odd confluence of behaviors from Oozie, Zookeeper, and Accumulo (another implementation of BigTable).

The gist is that occasionally -- randomly -- the Oozie-launched Java program needs to go looking for the Accumulo site configuration, which requires looking for an XML file resource on the classpath. Not finding it, it goes with the defaults, meaning Accumulo no longer knows where my cluster's Zookeepers are; it tries to reconnect to localhost (the default) and fails in an endless loop.

So yes, I've set the relevant properties in my own configuration which I give to Oozie, but when "something" happens (my WAG: zookeeper lock lost?) Accumulo insists on looking in its SiteConfiguration, which means loading the XML resource.

For the moment I've placed a softlink in $HADOOP_HOME/conf/ to the needed Accumulo configuration file, but I'm wondering if I can just tell the task JVMs to have access to the Accumulo configuration directories as well.

Reply via email to