[ https://issues.apache.org/jira/browse/XALANJ-2607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16929759#comment-16929759 ]
Gary Gregory commented on XALANJ-2607: -------------------------------------- I think we should raise the platform requirements for Xalan to at least Java 7 if not 8. > Performance improvement for large documents using ID attributes > --------------------------------------------------------------- > > Key: XALANJ-2607 > URL: https://issues.apache.org/jira/browse/XALANJ-2607 > Project: XalanJ2 > Issue Type: Improvement > Security Level: No security risk; visible to anyone(Ordinary problems in > Xalan projects. Anybody can view the issue.) > Components: DTM > Affects Versions: 2.7.2 > Environment: Tested on but not limited to: Windows, x64, JRE 1.8 > Reporter: Matthias Urban > Assignee: Steven J. Hathaway > Priority: Major > Attachments: DTMStringPool.patch > > > XalanJ gets very slow for large XML documents using ID attributes often used > in article lists. If, for instance, an article list with 1.000.000 entries is > parsed, then it takes 6 minutes (on my machine) just to build the DTM. This > is due to a design decision in DTMStringPool utilizing a fixed size hash > table of 101 entries. This works astoundingly well for documents with less > than 10.000 different attribute values. Then it starts to get slower and > slower. > I've tested multiple solutions, like to increase the hash table size to > 100.000 and 1.000.000 (overkill for everyday documents), or make it > configurable leaving the current size the default (needs to introduce a > 'backdoor' system property). In the end the best solution was to use a simple > HashMap for the string lookup. It proved to be as fast as the fixed size > table combined with a very good scalability. See the patch attached to this > issue. It was created using the current trunk version of DTMStringPool.java. > Here is an example for testing. Without the patch applied it takes about 6 > minutes to finish. With the patch applied it only takes 3 seconds! > {code:title=Test.java} > import java.io.File; > import java.io.FileOutputStream; > import java.io.FileWriter; > import java.util.Date; > import javax.xml.transform.Transformer; > import javax.xml.transform.TransformerFactory; > import javax.xml.transform.stream.StreamResult; > import javax.xml.transform.stream.StreamSource; > public class Test { > public static void main(String[] args) { > try { > File xmlfile = new File("doc.xml"); > FileWriter fout = new FileWriter(xmlfile); > fout.write("<catalogue>\n"); > for (int i = 0; i < 1000000; i++) { > fout.append("<article id=\"" + i + "\">articlename</article>\n"); > } > fout.write("</catalogue>"); > fout.close(); > System.out.println("Start : " + new Date()); > TransformerFactory factory = TransformerFactory.newInstance(); > Transformer transformer = factory.newTransformer(new StreamSource(new > File("script.xsl"))); > transformer.transform(new StreamSource(xmlfile), new StreamResult(new > FileOutputStream("out.txt"))); > System.out.println("End : " + new Date()); > } > catch (Exception e) { > e.printStackTrace(); > } > } > } > {code} > {code:xml|title=script.xsl} > <?xml version="1.0"?> > <xsl:stylesheet version="1.0" > xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> > <xsl:output method="text"/> > <xsl:template match="text()|@*"/> > <xsl:template match="/"> > <xsl:apply-templates select="*"/> > </xsl:template> > <xsl:template match="article"> > <xsl:value-of select="@id"/> > <xsl:text> </xsl:text> > </xsl:template> > </xsl:stylesheet> > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@xalan.apache.org For additional commands, e-mail: dev-h...@xalan.apache.org