Robert Kanter created OOZIE-2310:
------------------------------------
Summary: If the Hadoop configuration is not configured, you get a
NullPointerException on job submission
Key: OOZIE-2310
URL: https://issues.apache.org/jira/browse/OOZIE-2310
Project: Oozie
Issue Type: Bug
Components: core
Affects Versions: 4.1.0
Reporter: Robert Kanter
Priority: Blocker
A user reported an NPE on startup here:
http://mail-archives.apache.org/mod_mbox/oozie-user/201507.mbox/%3cCALBGZ8oZ0GZ+hf76nQYKxiATHH5g2gbQ_0sQ78uQv_=r4Hct=q...@mail.gmail.com%3e
I did some digging and the problem is that Oozie is trying to load the Sharelib
from but the {{FileSystem}} class variable is {{null}} because the
{{ShareLibService}} wasn't able to create it on {{init}}. That would normally
cause Oozie to fail on startup, but the default value of
{{oozie.service.ShareLibService.fail.fast.on.startup}} is {{false}}, so it gets
ignored.
The code in question is this:
{code:java}
try {
fs = FileSystem.get(has.createJobConf(uri.getAuthority()));
//cache action key sharelib conf list
cacheActionKeySharelibConfList();
updateLauncherLib();
updateShareLib();
}
catch (Throwable e) {
if (failOnfailure) {
LOG.error("Sharelib initialization fails", e);
throw new ServiceException(ErrorCode.E0104,
getClass().getName(), "Sharelib initialization fails. ", e);
}
else {
// We don't want to actually fail init by throwing an
Exception, so only create the ServiceException and
// log it
ServiceException se = new ServiceException(ErrorCode.E0104,
getClass().getName(),
"Not able to cache sharelib. An Admin needs to install
the sharelib with oozie-setup.sh and issue the "
+ "'oozie admin' CLI command to update the
sharelib", e);
LOG.error(se);
}
}
{code}
where {{failOnfailure}} is {{false}} by default. So, {{fs}} ends up being
{{null}}, and if anything later tries to use it, you get an NPE.
I think we should do two things here:
# Creating the {{FileSystem}} should be in a different try-catch so that the
{{failOnfailure}} doesn't affect it. The original intention of that behavior
was to ignore ShareLib failures, not Hadoop failures.
# We should improve the default Hadoop configuration (i.e.
{{oozie.service.HadoopAccessorService.hadoop.configurations}}). This has been
a problem for a while now where out-of-the-box, Oozie doesn't work even for a
local psuedo-cluster because of this config's default. If that's not possible,
we need to make it more obvious that user's must configure this before doing
anything.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)