Thanks for the great write-up! A few comments below.

On 11/09/2015 06:07, Robert Muir wrote:
2. we have a "jar hell detector" that threw an
UnsupportedOperationException, because classloader is no longer a
URLClassLoader, so we can't get the list of urls. This caused all
tests to fail. I changed the code to parse java.class.path.
Right, code should never assume that the application class loader is a implemented as a URLClassLoader (more on this in JEP 261).


3. we have a "jvm info" api that provides information about the jvm,
e.g. to assist our engineers in debugging different nodes in the
cluster. it was not prepared to handle UnsupportedOperationException
from RuntimeMXBean.getBootClassPath: I fixed it to fall back to
sun.boot.class.path, otherwise fall back to "unknown".
This is a another behavior change. RuntimeMXBean.getBootClassPath() has always specified that it can throw UOE but the JDK has not needed to do this until now. The alternative choice here is to return an empty string but that might cause issues too.

4. exception serialization tests failed, because we manually serialize
exceptions. We previously used java serialization, but it causes
serious trouble because of backwards compatibility breaks between even
minor jdk versions: this would strike when users try to upgrade their
jvms for nodes in their cluster with a rolling restart. The tests fail
because the stacktrace "loses" stuff after deserialization (the module
version). For now i just disabled the tests on java 9, because I don't
know how we can support e.g. java 8 and java 9 and populate this stuff
"optionally" yet without more digging.
Stack traces have been updated to optionally include the module and version but this should be a compatible change (except maybe for code that parses the String representation). As you mention, these tests would need to be updated any time that there are new fields added to the serial form (for standard/Java SE types then this should only be major releases).


5. we have monitoring apis that provide basic system information,
similar to #3, for debugging purposes, and to feed monitoring tools so
people can track the health of the cluster. previously, we used the
sigar library (JNI) for this, but it has bugs that caused users
crashes. So we were forced to limit ourselves to what is provided with
java management apis: which is much less, but we figure it has the
basics. For some very basic stats, this means we also look for
com.sun.management apis
(https://docs.oracle.com/javase/7/docs/jre/api/management/extension/com/sun/management/package-summary.html)
and if they are available, we provide the stuff available there too,
like how much ram is on the machine, swap in use, number of open/max
file descriptors, and so on. We test what is available and what is not
based on platform so we can detect if something changes in the JDK,
like what happens with jigsaw, where they all become unavailable.
I'm not sure that I understand the issue here but just to say that the com.sun.management API is a documented/supported API and it exported by module jdk.management:

$ java -listmods:jdk.management

jdk.management@9.0
  requires public java.management
  requires mandated java.base
  exports com.sun.management
  conceals com.sun.management.internal
provides sun.management.spi.PlatformMBeanProvider with com.sun.management.internal.PlatformMBeanProviderImpl



6. cluster snapshot/restore to amazon s3 does not work, because of
their use of internal ssl libraries. I've tried to get them to fix it
for a while now (https://github.com/aws/aws-sdk-java/pull/432). This
is also a serious loss of functionality, if they wont fix it, I guess
we have to fork the aws sdk.
You can workaround this with -XaddExport:java.base/sun.security.ssl=ALL-UNNAMED of course but much better if they could understand and remove the dependency on these internal classes.



8. during testing I hit some kind of bug, where the thai break
iterator returned wrong information. This might be hotspot-related or
something else, and it never reproduced again. We use this check
(https://github.com/apache/lucene-solr/blob/trunk/lucene/analysis/common/src/java/org/apache/lucene/analysis/th/ThaiTokenizer.java#L37-L47)
to see if we can "really" tokenize thai, otherwise we throw an
exception. For some IBM JVM versions at least in the past, they did
not have a breakiterator for thai. I guess it just goes to show the EA
build is really a prototype, and not yet ready to be added to our CI
servers and so on... which is the only way I can ensure this huge
codebase stays working with jigsaw.
Just so I understand, the Thai break iterator issue was with the jigsaw EA builds and not the regular JDK 9 builds, right? And it only happened once, you can't reproduce. This is a bit worrisome. All I can say is that there are a lot of changes in this area, a lot of technical debt related to the split with the java.base and the jdk.localedata module had to be addressed. Off-hand then I can't think of anything that would lead to an intermittent issue. If you find out more on this then please send mail.

-Alan

Reply via email to