Hi there,
I have been working on getting Accumulo running on IBM JDK, as preparation
of including Accumulo in an upcoming version of BigInsights (IBM's Hadoop
distribution). I have come across a number of issues, to which I have made
some local fixes in my own environment. Since I'm a newbie in Accumulo, I
wanted to make sure that the approach that I have taken for resolving
these issues is aligned with the design intent of Accumulo.
Some of the issues are real defects, and some are instances in which the
assumption of Sun/Oracle JDK being the used JVM is hard-coded into the
source-code.
I have grouped the issues into 2 sections - Unit test failures and
Sun-specific dependencies (though there is an overlap)
1. Unit Test failures - should run consistently no matter which OS, Java
vendor/version etc...
a.
org.apache.accumulo.core.util.format.ShardedTableDistributionFormatterTest.testAggregate
. This fails on IBM JRE, since the test is asserting order of elements in
a HashMap. This consistently passes on Sun , and consistently fails on
Oracle. Proposal: Change ShardedTableDistributionFormatter.countsByDay to
TreeMap
b.
org.apache.accumulo.core.security.crypto.BlockedIOStreamTest.testGiantWrite.
This test assumes a max heap of about 1GB. This fails on IBM JRE,
since the default max heap is not specified, and on IBM JRE this depends
on the OS (see
http://www-01.ibm.com/support/knowledgecenter/SSYKE2_6.0.0/com.ibm.java.doc.diagnostics.60/diag/appendixes/defaults.html?lang=en
).
Proposal: add -Xmx1g to the surefire maven plugin reference in
parent maven pom.
c. Both org.apache.accumulo.core.security.crypto.CrypoTest &
org.apache.accumulo.core.file.rfile.RFileTest have lots of failures due to
calls to SEcureRandom with Random Number Generator Provider hard-coded as
Sun. The IBM JRE has it's own built in RNG Provider called IBMJCE. 2
issues - hard-coded calls to SecureRandom.getInstance(<algo>,"SUN") and
also default value in Property class is "SUN".
Proposal: Add mechanism to override default Property through
System property through new annotator in Property class. Only usage will
be by Property.CRYPTO_SECURE_RNG_PROVIDER
2. Environment/Configuration
a. The generated configuration files contain references to GC
params that are specific to Sun JVM. In accumulo-env.sh, the
ACCUMULO_TSERVER_OPTS contains -XX:NewSize and -XX:MaxNewSize , and also
in ACCUMULO_GENERAL_OPTS,
-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 are used.
b. in bin/accumulo, get ClassNotFoundException due to
specification of JAXP Doc Builder:
-Djavax.xml.parsers.DocumentBuilderFactory=com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl
.
The Sun implementation of Document Builder Factory does not exists
in IBM JDK, so a ClassNotFoundException is thrown on running accumulo
script
c. MiniAccumuloCluster - in the MiniAccumuloClusterImpl,
Sun-speciifc GC params are passed as params to the java process (similar
to section a. )
Single proposal for solving all three above issues:
Enhance bootstrap_config.sh with request to select Java vendor.
Selecting this will set correct values for GC params (they differ between
IBM and Sun), inclusion/ommision of JAXP setting. The
MiniAccumuloClusterImpl can read the same env variable that was set in
code for the GC Params, and use in the exec command.
So far, my work has been focused on getting unit tests working for all
Java vendors in a clean manner. I have not yet run intensive testing of
real clusters following these changes, and would be happy to get pointers
to what else might need treatment.
I would also like to hear if these changes make sense, and if so, should
I go ahead and create some JIRAs, and attach my patches for commit
approval?
Looking forward to hearing feedback!
Regards,
Hayden Marchant
Software Architect
IBM BigInsights, IBM