Thanks for the great write-up! A few comments below.
On 11/09/2015 06:07, Robert Muir wrote:
2. we have a "jar hell detector" that threw an
UnsupportedOperationException, because classloader is no longer a
URLClassLoader, so we can't get the list of urls. This caused all
tests to fail. I changed the code to parse java.class.path.
Right, code should never assume that the application class loader is a
implemented as a URLClassLoader (more on this in JEP 261).
3. we have a "jvm info" api that provides information about the jvm,
e.g. to assist our engineers in debugging different nodes in the
cluster. it was not prepared to handle UnsupportedOperationException
from RuntimeMXBean.getBootClassPath: I fixed it to fall back to
sun.boot.class.path, otherwise fall back to "unknown".
This is a another behavior change. RuntimeMXBean.getBootClassPath() has
always specified that it can throw UOE but the JDK has not needed to do
this until now. The alternative choice here is to return an empty string
but that might cause issues too.
4. exception serialization tests failed, because we manually serialize
exceptions. We previously used java serialization, but it causes
serious trouble because of backwards compatibility breaks between even
minor jdk versions: this would strike when users try to upgrade their
jvms for nodes in their cluster with a rolling restart. The tests fail
because the stacktrace "loses" stuff after deserialization (the module
version). For now i just disabled the tests on java 9, because I don't
know how we can support e.g. java 8 and java 9 and populate this stuff
"optionally" yet without more digging.
Stack traces have been updated to optionally include the module and
version but this should be a compatible change (except maybe for code
that parses the String representation). As you mention, these tests
would need to be updated any time that there are new fields added to the
serial form (for standard/Java SE types then this should only be major
releases).
5. we have monitoring apis that provide basic system information,
similar to #3, for debugging purposes, and to feed monitoring tools so
people can track the health of the cluster. previously, we used the
sigar library (JNI) for this, but it has bugs that caused users
crashes. So we were forced to limit ourselves to what is provided with
java management apis: which is much less, but we figure it has the
basics. For some very basic stats, this means we also look for
com.sun.management apis
(https://docs.oracle.com/javase/7/docs/jre/api/management/extension/com/sun/management/package-summary.html)
and if they are available, we provide the stuff available there too,
like how much ram is on the machine, swap in use, number of open/max
file descriptors, and so on. We test what is available and what is not
based on platform so we can detect if something changes in the JDK,
like what happens with jigsaw, where they all become unavailable.
I'm not sure that I understand the issue here but just to say that the
com.sun.management API is a documented/supported API and it exported by
module jdk.management:
$ java -listmods:jdk.management
jdk.management@9.0
requires public java.management
requires mandated java.base
exports com.sun.management
conceals com.sun.management.internal
provides sun.management.spi.PlatformMBeanProvider with
com.sun.management.internal.PlatformMBeanProviderImpl
6. cluster snapshot/restore to amazon s3 does not work, because of
their use of internal ssl libraries. I've tried to get them to fix it
for a while now (https://github.com/aws/aws-sdk-java/pull/432). This
is also a serious loss of functionality, if they wont fix it, I guess
we have to fork the aws sdk.
You can workaround this with
-XaddExport:java.base/sun.security.ssl=ALL-UNNAMED of course but much
better if they could understand and remove the dependency on these
internal classes.
8. during testing I hit some kind of bug, where the thai break
iterator returned wrong information. This might be hotspot-related or
something else, and it never reproduced again. We use this check
(https://github.com/apache/lucene-solr/blob/trunk/lucene/analysis/common/src/java/org/apache/lucene/analysis/th/ThaiTokenizer.java#L37-L47)
to see if we can "really" tokenize thai, otherwise we throw an
exception. For some IBM JVM versions at least in the past, they did
not have a breakiterator for thai. I guess it just goes to show the EA
build is really a prototype, and not yet ready to be added to our CI
servers and so on... which is the only way I can ensure this huge
codebase stays working with jigsaw.
Just so I understand, the Thai break iterator issue was with the jigsaw
EA builds and not the regular JDK 9 builds, right? And it only happened
once, you can't reproduce. This is a bit worrisome. All I can say is
that there are a lot of changes in this area, a lot of technical debt
related to the split with the java.base and the jdk.localedata module
had to be addressed. Off-hand then I can't think of anything that would
lead to an intermittent issue. If you find out more on this then please
send mail.
-Alan