Hi David,

for me only 1 should be done.

well, i didnt understand the whole mail: why do we need to browse the zip
file multiple times? only for the getbytecode method? i think we can get
rid of multiple scannings and keep the link() features. Another point is
getAnnotatedClasses() should be able to return sthg even when link() was
not called.

If the zip parsing is badly done by the jre (if it doesn't use fseek for
instance) we simply have to rewrite it.

well in org.apache.xbean.finder.archive.JarArchive.JarIterator#JarIterator
why Jarfile is not used when possible?

- Romain


2012/4/15 David Blevins <[email protected]>

> (decision and 4 choices at the bottom -- feedback requested)
>
> I did some studying of the zip file format and determined that part of the
> reworked xbean-finder Archive API was plain wrong.
>
> Using maps as an analogy here is how we were effectively scanning zips
> (jars):
>
>    "Style A"
>
>    Map<String, InputStream> zip = new HashMap<String, InputStream>();
>    for (String entryName : zip.keySet()) {
>        InputStream inputStream = zip.get(entryName);
>        // scan the stream
>    }
>
> While there is some indexing in a zip file in what is called the central
> directory, it isn't nearly good enough to support this type of random
> access.  The actual reading is done in C code when a zip file is randomly
> accessed in this way, but basically it seems about as slow as starting at
> the beginning of a stream and reading ahead in the stream until the index
> is hit and then reading for "real".  I doubt it's doing exactly that as in
> C code you should be able to start in the middle of a file, but let's put
> it this way... at the very minimum you are reading the Central Directory
> each and every single random access.
>
> I've reworked the Archive API so that when you iterate over it, you
> iterate over actual entries.  Using map again as an analogy it looks like
> this now:
>
>    "Style B"
>
>    for (Map.Entry<String, InputStream> entry : zip.entrySet()) {
>        String className = entry.getKey();
>        InputStream inputStream = entry.getValue();
>        // scan the stream
>    }
>
>
> Using Altassian Confluence as a driver to benchmark only the call to 'new
> AnnotationFinder(archive)' which is where our scanning happens, here are
> the results before (style A) and after (style b):
>
>
>  StyleA: 8.89s - 9.02s
>  StyleB: 3.33s - 3.52s
>
> Now unfortunately the 'link()' call used to resolve parent classes that
> are not in the jars scanned as well as to resolve meta-annotations still
> needs the StyleA random access.  These things don't involve going in "jar
> order", but definitely are random access.  With the new and improved code
> that scans Confluence at around 3.4s, here is the time with 'link()' added
>
>  StyleB scan + StyleA link: 15.61s - 15.75s
>
> That link() call adds another 12 seconds.  Roughly equivalent to the cost
> of 4 more scans.
>
> So the good news is we don't need the link.  We very much like the link,
> but we don't need the link for Java EE 6 certification.  We have two very
> excellent features associated with that linking.
>
>  - Meta-Annotations
>  - Discovery JAX-RS of non-annotated Application subclasses (Application
> is a concrete class you subclass, like HttpServlet)
>
> We have more or less 4 kinds of choices on how we deal with this:
>
>  1. Link() is always called.  (always slow, extra features always enabled)
>  2. Link() can be disabled but is enabled by default.   (slow, w/optional
> fast flag, extra features enabled by default)
>  3. Link() can be enabled but is disabled by default.   (fast, w/optional
> slow flag, extra features disabled by default)
>  4. Link is never enabled.  (always fast, extra features permanently
> disabled)
>
>
> Thoughts, preferences?
>
>
> -David
>
>

Reply via email to