added a patch: https://issues.apache.org/jira/browse/XBEAN-206
can you test it against your mini bench please? - Romain 2012/4/15 Romain Manni-Bucau <[email protected]> > Hi David, > > for me only 1 should be done. > > well, i didnt understand the whole mail: why do we need to browse the zip > file multiple times? only for the getbytecode method? i think we can get > rid of multiple scannings and keep the link() features. Another point is > getAnnotatedClasses() should be able to return sthg even when link() was > not called. > > If the zip parsing is badly done by the jre (if it doesn't use fseek for > instance) we simply have to rewrite it. > > well in org.apache.xbean.finder.archive.JarArchive.JarIterator#JarIterator > why Jarfile is not used when possible? > > - Romain > > > > 2012/4/15 David Blevins <[email protected]> > >> (decision and 4 choices at the bottom -- feedback requested) >> >> I did some studying of the zip file format and determined that part of >> the reworked xbean-finder Archive API was plain wrong. >> >> Using maps as an analogy here is how we were effectively scanning zips >> (jars): >> >> "Style A" >> >> Map<String, InputStream> zip = new HashMap<String, InputStream>(); >> for (String entryName : zip.keySet()) { >> InputStream inputStream = zip.get(entryName); >> // scan the stream >> } >> >> While there is some indexing in a zip file in what is called the central >> directory, it isn't nearly good enough to support this type of random >> access. The actual reading is done in C code when a zip file is randomly >> accessed in this way, but basically it seems about as slow as starting at >> the beginning of a stream and reading ahead in the stream until the index >> is hit and then reading for "real". I doubt it's doing exactly that as in >> C code you should be able to start in the middle of a file, but let's put >> it this way... at the very minimum you are reading the Central Directory >> each and every single random access. >> >> I've reworked the Archive API so that when you iterate over it, you >> iterate over actual entries. Using map again as an analogy it looks like >> this now: >> >> "Style B" >> >> for (Map.Entry<String, InputStream> entry : zip.entrySet()) { >> String className = entry.getKey(); >> InputStream inputStream = entry.getValue(); >> // scan the stream >> } >> >> >> Using Altassian Confluence as a driver to benchmark only the call to 'new >> AnnotationFinder(archive)' which is where our scanning happens, here are >> the results before (style A) and after (style b): >> >> >> StyleA: 8.89s - 9.02s >> StyleB: 3.33s - 3.52s >> >> Now unfortunately the 'link()' call used to resolve parent classes that >> are not in the jars scanned as well as to resolve meta-annotations still >> needs the StyleA random access. These things don't involve going in "jar >> order", but definitely are random access. With the new and improved code >> that scans Confluence at around 3.4s, here is the time with 'link()' added >> >> StyleB scan + StyleA link: 15.61s - 15.75s >> >> That link() call adds another 12 seconds. Roughly equivalent to the cost >> of 4 more scans. >> >> So the good news is we don't need the link. We very much like the link, >> but we don't need the link for Java EE 6 certification. We have two very >> excellent features associated with that linking. >> >> - Meta-Annotations >> - Discovery JAX-RS of non-annotated Application subclasses (Application >> is a concrete class you subclass, like HttpServlet) >> >> We have more or less 4 kinds of choices on how we deal with this: >> >> 1. Link() is always called. (always slow, extra features always enabled) >> 2. Link() can be disabled but is enabled by default. (slow, w/optional >> fast flag, extra features enabled by default) >> 3. Link() can be enabled but is disabled by default. (fast, w/optional >> slow flag, extra features disabled by default) >> 4. Link is never enabled. (always fast, extra features permanently >> disabled) >> >> >> Thoughts, preferences? >> >> >> -David >> >> >
