[
https://issues.apache.org/jira/browse/XALANJ-2607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16929759#comment-16929759
]
Gary Gregory commented on XALANJ-2607:
--------------------------------------
I think we should raise the platform requirements for Xalan to at least Java 7
if not 8.
> Performance improvement for large documents using ID attributes
> ---------------------------------------------------------------
>
> Key: XALANJ-2607
> URL: https://issues.apache.org/jira/browse/XALANJ-2607
> Project: XalanJ2
> Issue Type: Improvement
> Security Level: No security risk; visible to anyone(Ordinary problems in
> Xalan projects. Anybody can view the issue.)
> Components: DTM
> Affects Versions: 2.7.2
> Environment: Tested on but not limited to: Windows, x64, JRE 1.8
> Reporter: Matthias Urban
> Assignee: Steven J. Hathaway
> Priority: Major
> Attachments: DTMStringPool.patch
>
>
> XalanJ gets very slow for large XML documents using ID attributes often used
> in article lists. If, for instance, an article list with 1.000.000 entries is
> parsed, then it takes 6 minutes (on my machine) just to build the DTM. This
> is due to a design decision in DTMStringPool utilizing a fixed size hash
> table of 101 entries. This works astoundingly well for documents with less
> than 10.000 different attribute values. Then it starts to get slower and
> slower.
> I've tested multiple solutions, like to increase the hash table size to
> 100.000 and 1.000.000 (overkill for everyday documents), or make it
> configurable leaving the current size the default (needs to introduce a
> 'backdoor' system property). In the end the best solution was to use a simple
> HashMap for the string lookup. It proved to be as fast as the fixed size
> table combined with a very good scalability. See the patch attached to this
> issue. It was created using the current trunk version of DTMStringPool.java.
> Here is an example for testing. Without the patch applied it takes about 6
> minutes to finish. With the patch applied it only takes 3 seconds!
> {code:title=Test.java}
> import java.io.File;
> import java.io.FileOutputStream;
> import java.io.FileWriter;
> import java.util.Date;
> import javax.xml.transform.Transformer;
> import javax.xml.transform.TransformerFactory;
> import javax.xml.transform.stream.StreamResult;
> import javax.xml.transform.stream.StreamSource;
> public class Test {
> public static void main(String[] args) {
> try {
> File xmlfile = new File("doc.xml");
> FileWriter fout = new FileWriter(xmlfile);
> fout.write("<catalogue>\n");
> for (int i = 0; i < 1000000; i++) {
> fout.append("<article id=\"" + i + "\">articlename</article>\n");
> }
> fout.write("</catalogue>");
> fout.close();
> System.out.println("Start : " + new Date());
> TransformerFactory factory = TransformerFactory.newInstance();
> Transformer transformer = factory.newTransformer(new StreamSource(new
> File("script.xsl")));
> transformer.transform(new StreamSource(xmlfile), new StreamResult(new
> FileOutputStream("out.txt")));
> System.out.println("End : " + new Date());
> }
> catch (Exception e) {
> e.printStackTrace();
> }
> }
> }
> {code}
> {code:xml|title=script.xsl}
> <?xml version="1.0"?>
> <xsl:stylesheet version="1.0"
> xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
> <xsl:output method="text"/>
> <xsl:template match="text()|@*"/>
> <xsl:template match="/">
> <xsl:apply-templates select="*"/>
> </xsl:template>
> <xsl:template match="article">
> <xsl:value-of select="@id"/>
> <xsl:text> </xsl:text>
> </xsl:template>
> </xsl:stylesheet>
> {code}
--
This message was sent by Atlassian Jira
(v8.3.2#803003)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]