[
https://issues.apache.org/jira/browse/OOZIE-3170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16345077#comment-16345077
]
Attila Sasvari commented on OOZIE-3170:
---------------------------------------
[~jphelps] many thanks for reporting this issue and reviewing related parts of
the codebase. I added you as a contributor to the project and assigned this
Jira to you.
I reproduced that the cited NPE exception is thrown. The diag bundle zip is
however generated and contain sharelib info - it is just using the
OozieClient's
[listShareLib()|https://github.com/apache/oozie/blob/ef6d0af5edeb18fbc0259d1962ac70f8ad7c2a0c/tools/src/main/java/org/apache/oozie/tools/diag/ServerInfoCollector.java#L42]:
{code:java}
$ bin/oozie-diag-bundle-collector.sh -oozie http://localhost:11000/oozie
-output /tmp/jobs/
Checking Connection...Done
Using Temporary Directory:
/var/folders/9q/f8p_r6gj0wbck49_dc092q_m0000gp/T/1517319232457-0
Getting Sharelib Information...Done
Getting Configuration...Done
Getting OS Environment Variables...Done
Getting Java System Properties...Done
Getting Queue Dump...Done
Getting Thread Dump...Done
Getting Instrumentation...Done
Getting Metrics...Skipping (Metrics are unavailable)
Creating Zip File: /tmp/jobs/oozie-diag-bundle-1517319233190.zip...Done
$ unzip -l /tmp/jobs/oozie-diag-bundle-1517319233190.zip
68029 01-30-18 14:33 /effective-oozie-site.xml
9876 01-30-18 14:33 /instrumentation.txt
38636 01-30-18 14:33 /java-sys-props.txt
3807 01-30-18 14:33 /os-env-vars.txt
279 01-30-18 14:33 /queue-dump.txt
40032 01-30-18 14:33 /sharelib.txt
102084 01-30-18 14:33 /thread-dump.html{code}
* In fact, I am not sure all those Oozie services are really needed here to be
able to collect diagnostic information. If they are not needed they shall not
be loaded at all.
* There is also another problem. By default, logs generated by the tool appear
in the server log if you run the tool from Oozie's home directory. It can make
things very confusing for an admin or anyone who review Oozie server logs.
Setting up logging is the responsibility of the
[XLogService|[https://github.com/apache/oozie/blob/ef6d0af5edeb18fbc0259d1962ac70f8ad7c2a0c/core/src/main/java/org/apache/oozie/service/XLogService.java#L145]]
and it is started via Services.init(). It can be controlled by the
{{oozie.log.dir}} system property (e.g. \{{export
JAVA_PROPERTIES="-Doozie.log.dir=/tmp/"}} before running the tool). This is
something we should clarify in the documentation of the tool and/or change the
code/script so that logs are put in the directory where the diag bundle is
generated by default.
> Oozie Diagnostic Bundle tool fails with NPE due to missing service class
> ------------------------------------------------------------------------
>
> Key: OOZIE-3170
> URL: https://issues.apache.org/jira/browse/OOZIE-3170
> Project: Oozie
> Issue Type: Bug
> Affects Versions: 5.0.0b1
> Reporter: Jason Phelps
> Priority: Major
> Attachments: OOZIE-3170-1.patch
>
>
>
> When I ran the below command after doing a clean build from the main branch
> {code:java}
> bin/oozie-diag-bundle-collector.sh -oozie
> http://jphelps-60-1.gce.cloudera.com:11000/oozie -output /tmp/jobs/
> {code}
> It will fail with an NPE. I apologize as I did not copy the client error, but
> the error in oozie.log is below:
> {code:java}
> 2018-01-25 10:53:58,123 ERROR ShareLibService:517 - SERVER[]
> org.apache.oozie.service.ServiceException: E0104: Could not fully initialize
> service [org.apache.oozie.service.ShareLibService], Not able to cache
> sharelib. An Admin needs to install the sharelib with oozie-setup.sh and
> issue the 'oozie admin' CLI command to update the sharelib
> org.apache.oozie.service.ServiceException: E0104: Could not fully initialize
> service [org.apache.oozie.service.ShareLibService], Not able to cache
> sharelib. An Admin needs to install the sharelib with oozie-setup.sh and
> issue the 'oozie admin' CLI command to update the sharelib
> at org.apache.oozie.service.ShareLibService.init(ShareLibService.java:144)
> at org.apache.oozie.service.Services.setServiceInternal(Services.java:386)
> at org.apache.oozie.service.Services.setService(Services.java:372)
> at org.apache.oozie.service.Services.loadServices(Services.java:304)
> at org.apache.oozie.service.Services.init(Services.java:212)
> at
> org.apache.oozie.tools.diag.DiagBundleCollectorDriver.initOozieServices(DiagBundleCollectorDriver.java:153)
> at
> org.apache.oozie.tools.diag.DiagBundleCollectorDriver.setHadoopConfig(DiagBundleCollectorDriver.java:135)
> at
> org.apache.oozie.tools.diag.DiagBundleCollectorDriver.run(DiagBundleCollectorDriver.java:56)
> at
> org.apache.oozie.tools.diag.DiagBundleCollectorDriver.main(DiagBundleCollectorDriver.java:52)
> Caused by: java.lang.NullPointerException
> at
> org.apache.oozie.service.ShareLibService.cacheActionKeySharelibConfList(ShareLibService.java:878)
> at org.apache.oozie.service.ShareLibService.init(ShareLibService.java:132)
> ... 8 more
> 2018-01-25 10:53:58,130 INFO PartitionDependencyManagerService:520 - SERVER[]
> PartitionDependencyManagerService initialized. Dependency cache is
> org.apache.oozie.dependency.hcat.SimpleHCatDependencyCache
> 2018-01-25 10:53:58,131 FATAL Services:514 - SERVER[] Runtime Exception
> during Services Load. Check your list of {0} or {1}
> java.lang.NullPointerException
> at
> org.apache.oozie.service.PartitionDependencyManagerService.init(PartitionDependencyManagerService.java:81)
> at
> org.apache.oozie.service.PartitionDependencyManagerService.init(PartitionDependencyManagerService.java:71)
> at org.apache.oozie.service.Services.setServiceInternal(Services.java:386)
> at org.apache.oozie.service.Services.setService(Services.java:372)
> at org.apache.oozie.service.Services.loadServices(Services.java:304)
> at org.apache.oozie.service.Services.init(Services.java:212)
> at
> org.apache.oozie.tools.diag.DiagBundleCollectorDriver.initOozieServices(DiagBundleCollectorDriver.java:153)
> at
> org.apache.oozie.tools.diag.DiagBundleCollectorDriver.setHadoopConfig(DiagBundleCollectorDriver.java:135)
> at
> org.apache.oozie.tools.diag.DiagBundleCollectorDriver.run(DiagBundleCollectorDriver.java:56)
> at
> org.apache.oozie.tools.diag.DiagBundleCollectorDriver.main(DiagBundleCollectorDriver.java:52)
> 2018-01-25 10:53:58,132 FATAL Services:514 - SERVER[] E0103: Could not load
> service classes, null
> org.apache.oozie.service.ServiceException: E0103: Could not load service
> classes, null
> at org.apache.oozie.service.Services.loadServices(Services.java:309)
> at org.apache.oozie.service.Services.init(Services.java:212)
> at
> org.apache.oozie.tools.diag.DiagBundleCollectorDriver.initOozieServices(DiagBundleCollectorDriver.java:153)
> at
> org.apache.oozie.tools.diag.DiagBundleCollectorDriver.setHadoopConfig(DiagBundleCollectorDriver.java:135)
> at
> org.apache.oozie.tools.diag.DiagBundleCollectorDriver.run(DiagBundleCollectorDriver.java:56)
> at
> org.apache.oozie.tools.diag.DiagBundleCollectorDriver.main(DiagBundleCollectorDriver.java:52)
> Caused by: java.lang.NullPointerException
> at
> org.apache.oozie.service.PartitionDependencyManagerService.init(PartitionDependencyManagerService.java:81)
> at
> org.apache.oozie.service.PartitionDependencyManagerService.init(PartitionDependencyManagerService.java:71)
> at org.apache.oozie.service.Services.setServiceInternal(Services.java:386)
> at org.apache.oozie.service.Services.setService(Services.java:372)
> at org.apache.oozie.service.Services.loadServices(Services.java:304)
> ... 5 more
>
> {code}
> From my debugging, it looks like it needs the JobsConcurrencyService to run
>
> [https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/service/PartitionDependencyManagerService.java#L81]
>
> {code:java}
> purgeEnabled =
> Services.get().get(JobsConcurrencyService.class).isHighlyAvailableMode();{code}
> But this service is not loaded by the following:
> [https://github.com/apache/oozie/blob/master/tools/src/main/java/org/apache/oozie/tools/diag/DiagBundleCollectorDriver.java#L149]
> {code:java}
> services.getConf()
> .set(Services.CONF_SERVICE_CLASSES,
> "org.apache.oozie.service.LiteWorkflowAppService,"
> + "org.apache.oozie.service.SchedulerService,"
> + "org.apache.oozie.service.HadoopAccessorService,"
> + "org.apache.oozie.service.ShareLibService");{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)