Re: XBean and scanning performance

Romain Manni-Bucau Sun, 15 Apr 2012 14:17:27 -0700

added a patch: https://issues.apache.org/jira/browse/XBEAN-206


can you test it against your mini bench please?

- Romain


2012/4/15 Romain Manni-Bucau <[email protected]>

> Hi David,
>
> for me only 1 should be done.
>
> well, i didnt understand the whole mail: why do we need to browse the zip
> file multiple times? only for the getbytecode method? i think we can get
> rid of multiple scannings and keep the link() features. Another point is
> getAnnotatedClasses() should be able to return sthg even when link() was
> not called.
>
> If the zip parsing is badly done by the jre (if it doesn't use fseek for
> instance) we simply have to rewrite it.
>
> well in org.apache.xbean.finder.archive.JarArchive.JarIterator#JarIterator
> why Jarfile is not used when possible?
>
> - Romain
>
>
>
> 2012/4/15 David Blevins <[email protected]>
>
>> (decision and 4 choices at the bottom -- feedback requested)
>>
>> I did some studying of the zip file format and determined that part of
>> the reworked xbean-finder Archive API was plain wrong.
>>
>> Using maps as an analogy here is how we were effectively scanning zips
>> (jars):
>>
>>    "Style A"
>>
>>    Map<String, InputStream> zip = new HashMap<String, InputStream>();
>>    for (String entryName : zip.keySet()) {
>>        InputStream inputStream = zip.get(entryName);
>>        // scan the stream
>>    }
>>
>> While there is some indexing in a zip file in what is called the central
>> directory, it isn't nearly good enough to support this type of random
>> access.  The actual reading is done in C code when a zip file is randomly
>> accessed in this way, but basically it seems about as slow as starting at
>> the beginning of a stream and reading ahead in the stream until the index
>> is hit and then reading for "real".  I doubt it's doing exactly that as in
>> C code you should be able to start in the middle of a file, but let's put
>> it this way... at the very minimum you are reading the Central Directory
>> each and every single random access.
>>
>> I've reworked the Archive API so that when you iterate over it, you
>> iterate over actual entries.  Using map again as an analogy it looks like
>> this now:
>>
>>    "Style B"
>>
>>    for (Map.Entry<String, InputStream> entry : zip.entrySet()) {
>>        String className = entry.getKey();
>>        InputStream inputStream = entry.getValue();
>>        // scan the stream
>>    }
>>
>>
>> Using Altassian Confluence as a driver to benchmark only the call to 'new
>> AnnotationFinder(archive)' which is where our scanning happens, here are
>> the results before (style A) and after (style b):
>>
>>
>>  StyleA: 8.89s - 9.02s
>>  StyleB: 3.33s - 3.52s
>>
>> Now unfortunately the 'link()' call used to resolve parent classes that
>> are not in the jars scanned as well as to resolve meta-annotations still
>> needs the StyleA random access.  These things don't involve going in "jar
>> order", but definitely are random access.  With the new and improved code
>> that scans Confluence at around 3.4s, here is the time with 'link()' added
>>
>>  StyleB scan + StyleA link: 15.61s - 15.75s
>>
>> That link() call adds another 12 seconds.  Roughly equivalent to the cost
>> of 4 more scans.
>>
>> So the good news is we don't need the link.  We very much like the link,
>> but we don't need the link for Java EE 6 certification.  We have two very
>> excellent features associated with that linking.
>>
>>  - Meta-Annotations
>>  - Discovery JAX-RS of non-annotated Application subclasses (Application
>> is a concrete class you subclass, like HttpServlet)
>>
>> We have more or less 4 kinds of choices on how we deal with this:
>>
>>  1. Link() is always called.  (always slow, extra features always enabled)
>>  2. Link() can be disabled but is enabled by default.   (slow, w/optional
>> fast flag, extra features enabled by default)
>>  3. Link() can be enabled but is disabled by default.   (fast, w/optional
>> slow flag, extra features disabled by default)
>>  4. Link is never enabled.  (always fast, extra features permanently
>> disabled)
>>
>>
>> Thoughts, preferences?
>>
>>
>> -David
>>
>>
>

Re: XBean and scanning performance

Reply via email to