Hi -- I'm new here.  :-)
 
I am working with some large XML files (~6MB) that serve as indexes into hundreds of smaller XML content files.  The large index files include element attributes whose value is an XPath string, something like this:
 
    <page xpath="/sm/[EMAIL PROTECTED]&apos;09_02_0045&apos;]"/>
 
Each index file contains thousands of these elements.  As the indexes have grown larger, I have started seeing the following exception:
 
    org.dom4j.DocumentException: Error on line 31111 of document: Parser has reached the entity expansion limit "64,000" set by the Application. Nested exception: Parser has reached the entity expansion limit "64,000" set by the Application.
            at org.dom4j.io.SAXReader.read(SAXReader.java:355)
            at org.dom4j.io.SAXReader.read(SAXReader.java:297)
 
The problem seems to be all of those "&apos;" entities.  (There are as many as 85,000 of them in some of the files.)  But the apostrophes are required by the XPath syntax.  So I have a few questions:
 
1) Why does this 64,000-entity limit exist?
 
2) The error message says the limit was "set by the Application."  Does this mean there is a configurable limit that I can modify to avoid the exception?
 
3) The problem goes away if I manually replace all the entities ( &apos; ) with character references ( &#39; ).  However, my code sets the dom4j attribute value using a plain apostrophe in a Java string:
 
    pageElement.addAttribute( "xpath", "/sm/[EMAIL PROTECTED]'09_02_0045']" );
 
dom4j is apparently converting the apostrophes to entities as the document is written to disk.  Is there a way to tell dom4j to write the data using character references, rather than entities?
 
  Thanks,
- Chip Whitmer
  Mobile Productivity, Inc.
 

Reply via email to