https://bz.apache.org/bugzilla/show_bug.cgi?id=57893
Bug ID: 57893
Summary: XSSFSheet.getMergedRegion(int) takes O(n^2) time
Product: POI
Version: unspecified
Hardware: PC
OS: Linux
Status: NEW
Severity: normal
Priority: P2
Component: XSSF
Assignee: [email protected]
Reporter: [email protected]
Created attachment 32715
--> https://bz.apache.org/bugzilla/attachment.cgi?id=32715&action=edit
Sample code demonstrating slow performance
The attached Java compares the time taken to loop over sheet.getMergedRegion(i)
for all i, versus the same operation by direct access to the deprecated
CTMergeCells.getMergeCellArray().
The sample input I will attach, many-merges.xlsx, has 50k merged regions and I
get 8000ms vs 80ms. The real-world input I found this with has 250k merged
regions and takes ~300 seconds vs ~300ms.
The reason seems to be:
XSSFSheet.getMergedRegion(int)
-> CTMergeCells.getMergeCellArray(int)
-> Xobj.find_element_user(QName, int)
which does:
for ( Xobj x = _firstChild ; x != null ; x = x._nextSibling )
if (x.isElem() && x._name.equals( name ) && --i < 0)
return x.getUser();
So for each mergeCell you access you must compare the element name of every
previous mergeCell, whereas the deprecated CTMergeCells.getMergeCellArray()
uses find_all_element_users(), avoiding O(n^2).
Would you be interested in a patch? The only thing is, it seems the change
should be in CTMergeCells in ooxml-schemas, which seems to be generated(?) and
I'm not sure what generates it, or why getMergeCellArray() (the only fast
method) is deprecated. getMergeCellList() exists but only hands out a wrapper
around the slow getMergeCellArray(int).
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]