[jira] [Commented] (XALANJ-2607) Performance improvement for large documents using ID attributes

Mukul Gandhi (Jira) Sat, 14 Sep 2019 05:04:30 -0700


    [ 
https://issues.apache.org/jira/browse/XALANJ-2607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16929758#comment-16929758
 ]


Mukul Gandhi commented on XALANJ-2607:
--------------------------------------

I've few more findings as below on the provided patch, for the mentioned issue,

1) If I build Xalan jar with the provided patch using JDK 1.5, and use JRE 1.5 
to run the test case, I get java.lang.OutOfMemoryError.

2) If I build Xalan jar with the provided patch using JDK 1.8, and use JRE 1.8 
to run the test case, the stated performance improvement is achieved.

3) Running the test case, without patch with any JRE causes memory issues and 
increased latency.

Therefore, the patch provides benefit with JDK 1.8 but not with JDK 1.5. Since 
Xalan supports JDK 1.5 environment as well, it would have been good, if with 
the patch performance in JRE 1.5 environment is also improved.

I've still not assessed completely, the functional correctness of the provided 
patch. I'll be trying to do that. Anyone else could also do that, and let us 
know findings.

> Performance improvement for large documents using ID attributes
> ---------------------------------------------------------------
>
>                 Key: XALANJ-2607
>                 URL: https://issues.apache.org/jira/browse/XALANJ-2607
>             Project: XalanJ2
>          Issue Type: Improvement
>      Security Level: No security risk; visible to anyone(Ordinary problems in 
> Xalan projects.  Anybody can view the issue.) 
>          Components: DTM
>    Affects Versions: 2.7.2
>         Environment: Tested on but not limited to: Windows, x64, JRE 1.8
>            Reporter: Matthias Urban
>            Assignee: Steven J. Hathaway
>            Priority: Major
>         Attachments: DTMStringPool.patch
>
>
> XalanJ gets very slow for large XML documents using ID attributes often used 
> in article lists. If, for instance, an article list with 1.000.000 entries is 
> parsed, then it takes 6 minutes (on my machine) just to build the DTM. This 
> is due to a design decision in DTMStringPool utilizing a fixed size hash 
> table of 101 entries. This works astoundingly well for documents with less 
> than 10.000 different attribute values. Then it starts to get slower and 
> slower. 
> I've tested multiple solutions, like to increase the hash table size to 
> 100.000 and 1.000.000 (overkill for everyday documents), or make it 
> configurable leaving the current size the default (needs to introduce a 
> 'backdoor' system property). In the end the best solution was to use a simple 
> HashMap for the string lookup. It proved to be as fast as the fixed size 
> table combined with a very good scalability. See the patch attached to this 
> issue. It was created using the current trunk version of DTMStringPool.java.
> Here is an example for testing. Without the patch applied it takes about 6 
> minutes to finish. With the patch applied it only takes 3 seconds!
> {code:title=Test.java}
> import java.io.File;
> import java.io.FileOutputStream;
> import java.io.FileWriter;
> import java.util.Date;
> import javax.xml.transform.Transformer;
> import javax.xml.transform.TransformerFactory;
> import javax.xml.transform.stream.StreamResult;
> import javax.xml.transform.stream.StreamSource;
> public class Test {
>   public static void main(String[] args) {
>     try {
>       File xmlfile = new File("doc.xml");
>       FileWriter fout = new FileWriter(xmlfile);
>       fout.write("<catalogue>\n");
>       for (int i = 0; i < 1000000; i++) {
>         fout.append("<article id=\"" + i + "\">articlename</article>\n");
>       }
>       fout.write("</catalogue>");
>       fout.close();
>       System.out.println("Start : " + new Date());
>       TransformerFactory factory = TransformerFactory.newInstance();
>       Transformer transformer = factory.newTransformer(new StreamSource(new 
> File("script.xsl")));
>       transformer.transform(new StreamSource(xmlfile), new StreamResult(new 
> FileOutputStream("out.txt")));
>       System.out.println("End   : " + new Date());
>     }
>     catch (Exception e) {
>       e.printStackTrace();
>     }
>   }
> }
> {code}
> {code:xml|title=script.xsl}
> <?xml version="1.0"?>
> <xsl:stylesheet version="1.0" 
> xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
>   <xsl:output method="text"/>
>   <xsl:template match="text()|@*"/>
>   <xsl:template match="/">
>     <xsl:apply-templates select="*"/>
>   </xsl:template>
>   <xsl:template match="article">
>     <xsl:value-of select="@id"/>
>     <xsl:text>&#10;</xsl:text>
>   </xsl:template>
> </xsl:stylesheet>
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (XALANJ-2607) Performance improvement for large documents using ID attributes

Reply via email to