Hi All,

There are performance issues with using JDK 1.5 syntax methods in ooxml-schemas-1.1.jar and my concern is whether it is a blocker for POI-3.7.

In ooxml-schemas-1.0.jar a collection of XmlBeans could be accessed via getXXXArray(), e.g.

 CTRow[] rows = sheet.getRowArray()

ooxml-schemas-1.1.jar was compiled with JDK-1.5 support and the preferred way of accessing collections is via getXXXList():

 List<CTRow> rows = sheet.getRowList()

XmlBeans seems to force users to use getXXXList(), because all getXXXArray() accessors are marked deprecated. So we changed everything to use getXXXList() and thought we were fine :).

I always thought that getXXXList() and getXXXArray() are synonyms and the returned List is a wrapper around the array. I also thought that the following two forms of walking the sheet matrix are equivalent:

    //old-style getXXArray()
    for(CTRow row : sheet.getRowArray()){
        for(CTCell cell : row.getCArray()){

        }
    }

    // new getXXXList()
    for(CTRow row : sheet.getRowList()){
        for(CTCell cell : row.getCList()){

        }
    }

It turned out it is NOT so, getXXArray() is way faster than getXXXList(). I analyzed the auto-generated source code and found that they work differently.

A call of getXXXArray() performs an XPATH request to the underlying DOM and returns the selected beans. A call of getXXXList() does nothing. The returned List is a custom subclass of AbstractList where overridden List.get(int index) sends an XPATH request. This means that XPATH is sent on every iteration or on every call of List.get(int index).

You won't notice much difference for small files, but the larger the DOM, the more dramatic difference is.

Below are my benchmarks. I ran the code snippets above against sample sheets of different sizes.

matrix,
rows x columns            getXXXArray()        getXXXList()

100 x 100                 35ms                 70ms
1000 x 100                150ms                700ms
5000 x 100                570ms                4900ms
10000 x 100               3600ms               27000ms


I'm going to produce the release artifacts by Friday. There is no time to fix this problem - it is a serious change and I don't want to occasionally break anything. I'm inclined to think it is OK to release - we did three betas and so far the feedback was positive. I plan POI-3.8 in Dec-Jan and we can defer the fix until then.

What do people think?

Regards,
Yegor

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to