https://bz.apache.org/bugzilla/show_bug.cgi?id=57893

            Bug ID: 57893
           Summary: XSSFSheet.getMergedRegion(int) takes O(n^2) time
           Product: POI
           Version: unspecified
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: XSSF
          Assignee: [email protected]
          Reporter: [email protected]

Created attachment 32715
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=32715&action=edit
Sample code demonstrating slow performance

The attached Java compares the time taken to loop over sheet.getMergedRegion(i)
for all i, versus the same operation by direct access to the deprecated
CTMergeCells.getMergeCellArray().

The sample input I will attach, many-merges.xlsx, has 50k merged regions and I
get 8000ms vs 80ms. The real-world input I found this with has 250k merged
regions and takes ~300 seconds vs ~300ms.

The reason seems to be:

XSSFSheet.getMergedRegion(int)
-> CTMergeCells.getMergeCellArray(int)
-> Xobj.find_element_user(QName, int)

which does:

    for ( Xobj x = _firstChild ; x != null ; x = x._nextSibling )
      if (x.isElem() && x._name.equals( name ) && --i < 0)
        return x.getUser();

So for each mergeCell you access you must compare the element name of every
previous mergeCell, whereas the deprecated CTMergeCells.getMergeCellArray()
uses find_all_element_users(), avoiding O(n^2).

Would you be interested in a patch? The only thing is, it seems the change
should be in CTMergeCells in ooxml-schemas, which seems to be generated(?) and
I'm not sure what generates it, or why getMergeCellArray() (the only fast
method) is deprecated. getMergeCellList() exists but only hands out a wrapper
around the slow getMergeCellArray(int).

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to