(decision and 4 choices at the bottom -- feedback requested)
I did some studying of the zip file format and determined that part of the
reworked xbean-finder Archive API was plain wrong.
Using maps as an analogy here is how we were effectively scanning zips (jars):
"Style A"
Map<String, InputStream> zip = new HashMap<String, InputStream>();
for (String entryName : zip.keySet()) {
InputStream inputStream = zip.get(entryName);
// scan the stream
}
While there is some indexing in a zip file in what is called the central
directory, it isn't nearly good enough to support this type of random access.
The actual reading is done in C code when a zip file is randomly accessed in
this way, but basically it seems about as slow as starting at the beginning of
a stream and reading ahead in the stream until the index is hit and then
reading for "real". I doubt it's doing exactly that as in C code you should be
able to start in the middle of a file, but let's put it this way... at the very
minimum you are reading the Central Directory each and every single random
access.
I've reworked the Archive API so that when you iterate over it, you iterate
over actual entries. Using map again as an analogy it looks like this now:
"Style B"
for (Map.Entry<String, InputStream> entry : zip.entrySet()) {
String className = entry.getKey();
InputStream inputStream = entry.getValue();
// scan the stream
}
Using Altassian Confluence as a driver to benchmark only the call to 'new
AnnotationFinder(archive)' which is where our scanning happens, here are the
results before (style A) and after (style b):
StyleA: 8.89s - 9.02s
StyleB: 3.33s - 3.52s
Now unfortunately the 'link()' call used to resolve parent classes that are not
in the jars scanned as well as to resolve meta-annotations still needs the
StyleA random access. These things don't involve going in "jar order", but
definitely are random access. With the new and improved code that scans
Confluence at around 3.4s, here is the time with 'link()' added
StyleB scan + StyleA link: 15.61s - 15.75s
That link() call adds another 12 seconds. Roughly equivalent to the cost of 4
more scans.
So the good news is we don't need the link. We very much like the link, but we
don't need the link for Java EE 6 certification. We have two very excellent
features associated with that linking.
- Meta-Annotations
- Discovery JAX-RS of non-annotated Application subclasses (Application is a
concrete class you subclass, like HttpServlet)
We have more or less 4 kinds of choices on how we deal with this:
1. Link() is always called. (always slow, extra features always enabled)
2. Link() can be disabled but is enabled by default. (slow, w/optional fast
flag, extra features enabled by default)
3. Link() can be enabled but is disabled by default. (fast, w/optional slow
flag, extra features disabled by default)
4. Link is never enabled. (always fast, extra features permanently disabled)
Thoughts, preferences?
-David