[ 
https://issues.apache.org/jira/browse/XMLBEANS-438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18010650#comment-18010650
 ] 

PJ Fanning edited comment on XMLBEANS-438 at 7/29/25 1:08 PM:
--------------------------------------------------------------

I tried some changes but they actually made the performance worse.
I'm afraid it seems likely that xmlbeans needs a bigger rewrite than I have 
time for.
My changes are in https://github.com/pjfanning/xmlbeans-1/tree/perf438 but the 
bulk operations don't seem to help. I suspect that the way that cursors are 
used under the hood are the problem.

XObj uses a singly linked list of nodes to track child nodes. It feels like 
switching to an array or similar would improve most operations. But I suspect 
that we also need to find a way to avoid running cursor updates for the bulk of 
operations as well. We probably can't remove the cursor support but we should 
probably avoid the cursors unless someone is using one of the cursor APIs 
explicitly.

The underlying code is complicated and not adequately tested so changes are 
very difficult to get right.


was (Author: fanningpj):
I tried some changes but they actually made the performance worse.
I'm afraid it seems likely that xmlbeans needs a bigger rewrite than I have 
time for.
My changes are in https://github.com/pjfanning/xmlbeans-1/tree/perf438 but the 
bulk operations don't seem to help. I suspect that the way that cursors are 
used under the hood are the problem.

> O(n^2) operation in xmlbeans would like to make O(n log n) or better
> --------------------------------------------------------------------
>
>                 Key: XMLBEANS-438
>                 URL: https://issues.apache.org/jira/browse/XMLBEANS-438
>             Project: XMLBeans
>          Issue Type: Improvement
>    Affects Versions: Version 2.4 
>         Environment: Using XMLbeans inside of POI when saving spreadsheets in 
> the XLSX file format.
>            Reporter: Bryce Alcock
>            Priority: Major
>              Labels: OOXML, XLSX, XmlComplexContentImpl, Xobj
>             Fix For: unspecified
>
>         Attachments: Main.java, PoiTest.jar, ResponseTimeVsRowCount.png
>
>
> Using POI to save a spread sheet in XLSX format causes an O(n^2) operation to 
> occur.
> Specifically:
> Just a Quick overview
> it seems like XmlComplexContentImpl.arraySetterHelper iterates over all 
> elements in the spread sheet.
> then in side of that iteration, the Xobj.find_element_user(Qname,int) is an 
> Order n operation.
> Just making Xobj.find_element_user a O(log n) would completely solve the 
> problem.
> ANALYSIS:   O(n^2) in XML Beans:   (Around Line 1153 or so pf 
> XmlComplexContentImpl.java method arraySetterHelper)
> -----------------------------------------------------------
>  //   Order N  (this is invoked 1 time for each element in sources (~120,000 
> records in my case)
>  for ( i++, j++; i < sources.length; i++, j++)
>  {
>  long t1 = System.currentTimeMillis();
>     XmlCursor c = sources[i].isImmutable() ? null : sources[i].newCursor();
>  getCursorTime += (System.currentTimeMillis()-t1);
>  long t2=System.currentTimeMillis();
>     if (c != null && c.toParent() && c.getObject() == this)
>     {
>  ifCheckTime += (System.currentTimeMillis()-t2);
>  long t3 = System.currentTimeMillis();
>         c.dispose();
>  long t4 = System.currentTimeMillis();
>  //  THIS CAUSES O(n^2) SEE CODE BELOW, but this on O(n) for each record.
>  //  THIS IS THE INEFFICIENCY in XMLBeans...
>         current = store.find_element_user( elemName, j );
>  findUserElement += (System.currentTimeMillis() - t4);
>         if (current == sources[i])
>               ; // advance
>         else
>         {
>              // Fall back to the general case
>              break;
>         }
>  insideIfTime += (System.currentTimeMillis()-t3);
>    }
>    else
>    {
>  ifCheckTime += (System.currentTimeMillis()-t2);
>        c.dispose();
>        // Insert before the current element
>        TypeStoreUser user = store.insert_element_user( elemName, j );
>        ((XmlObjectBase) user).set( sources[i] );
>     }
>  }
>  ---------------- Method from store.find_element_user(QName,int i)---------
>  //  THIS METHOD IS 0(n)
>  //  but this is called n times by XmlComplexContentImpl.java around line 
> 1153....
>  public TypeStoreUser find_element_user ( QName name, int i )
>   {
>      for ( Xobj x = _firstChild ; x != null ; x = x._nextSibling )
>         if (x.isElem() && x._name.equals( name ) && --i < 0)
>             return x.getUser();
>         return null;
>   }
> I will add a sample code showing exactly how to reproduce this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org
For additional commands, e-mail: dev-h...@poi.apache.org

Reply via email to