[ 
https://issues.apache.org/jira/browse/XALANJ-2607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Urban updated XALANJ-2607:
-----------------------------------
    Summary: Performance improvement for large documents using ID attributes  
(was: Improve performance for large documents using ID attributes)

> Performance improvement for large documents using ID attributes
> ---------------------------------------------------------------
>
>                 Key: XALANJ-2607
>                 URL: https://issues.apache.org/jira/browse/XALANJ-2607
>             Project: XalanJ2
>          Issue Type: Improvement
>      Security Level: No security risk; visible to anyone(Ordinary problems in 
> Xalan projects.  Anybody can view the issue.) 
>          Components: DTM
>    Affects Versions: 2.7.2
>         Environment: Tested on but not limited to: Windows, x64, JRE 1.8
>            Reporter: Matthias Urban
>            Assignee: Steven J. Hathaway
>         Attachments: DTMStringPool.patch
>
>
> XalanJ gets very slow for large XML documents using ID attributes often used 
> in article lists. If, for instance, an article list with 1.000.000 entries is 
> parsed, then it takes 6 minutes (on my machine) just to build the DTM. This 
> is due to a design decision in DTMStringPool utilizing a fixed size hash 
> table of 101 entries. This works astoundingly well for documents with less 
> than 10.000 different attribute values. Then it starts to get slower and 
> slower. 
> I've tested multiple solutions, like to increase the hash table size to 
> 100.000 and 1.000.000 (overkill for everyday documents), or make it 
> configurable leaving the current size the default (needs to introduce a 
> 'backdoor' system property). In the end the best solution was to use a simple 
> HashMap for the string lookup. It proved to be as fast as the fixed size 
> table combined with a very good scalability. See the patch attached to this 
> issue. It was created using the current trunk version of DTMStringPool.java.
> Here is an example for testing. Without the patch applied it takes about 6 
> minutes to finish. With the patch applied it only takes 3 seconds!
> {code:title=Test.java}
> import java.io.File;
> import java.io.FileOutputStream;
> import java.io.FileWriter;
> import java.util.Date;
> import javax.xml.transform.Transformer;
> import javax.xml.transform.TransformerFactory;
> import javax.xml.transform.stream.StreamResult;
> import javax.xml.transform.stream.StreamSource;
> public class Test {
>   public static void main(String[] args) {
>     try {
>       File xmlfile = new File("doc.xml");
>       FileWriter fout = new FileWriter(xmlfile);
>       fout.write("<catalogue>\n");
>       for (int i = 0; i < 1000000; i++) {
>         fout.append("<article id=\"" + i + "\">articlename</article>\n");
>       }
>       fout.write("</catalogue>");
>       fout.close();
>       System.out.println("Start : " + new Date());
>       TransformerFactory factory = TransformerFactory.newInstance();
>       Transformer transformer = factory.newTransformer(new StreamSource(new 
> File("script.xsl")));
>       transformer.transform(new StreamSource(xmlfile), new StreamResult(new 
> FileOutputStream("out.txt")));
>       System.out.println("End   : " + new Date());
>     }
>     catch (Exception e) {
>       e.printStackTrace();
>     }
>   }
> }
> {code}
> {code:xml|title=script.xsl}
> <?xml version="1.0"?>
> <xsl:stylesheet version="1.0" 
> xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
>   <xsl:output method="text"/>
>   <xsl:template match="text()|@*"/>
>   <xsl:template match="/">
>     <xsl:apply-templates select="*"/>
>   </xsl:template>
>   <xsl:template match="article">
>     <xsl:value-of select="@id"/>
>     <xsl:text>&#10;</xsl:text>
>   </xsl:template>
> </xsl:stylesheet>
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@xalan.apache.org
For additional commands, e-mail: dev-h...@xalan.apache.org

Reply via email to