Re: memory leak with DTD entity references?

Neil Bacon Mon, 28 Aug 2006 16:38:40 -0700

Hi Michael,
Thanks for your reply.

Michael Glavassevich wrote:

Hi Neil,
There was a related discussion [1][2] about the SymbolTable on this listback in March 2005.

Thanks - yes I did come across that thread before posting. Althoughclosely related, I don't think its the same issue because that is aboutrunning out of memory parsing a single document and my issue isspecifically with reusing the same parser to parse many documents (usinga limited set of DTDs). I don't have a problem if I get a new parser foreach document.

Could the parser be keeping the symbol table from previous documents butnot reusing it when it comes across the same DTD in a new document?Perhaps this behaviour could be affected by my use oforg.apache.xerces.util.XMLCatalogResolver?

Do these large documents contain similar names or dothey contain many unique names. Specifically do your documents look likethis?Doc 1: <doc><elem1/> <elem2/> . . . <elem99999/> <elem100000/></doc>...Doc n: <doc><elem1-n/> <elem2-n/> . . . <elem99999-n/><elem100000-n/></doc>

No the data is not like that. There are a decent number of element namesas well as some heavily reused elements. The DTD's contain more than2000 entity declarations. I'm processing US patent application datafrom the USPTO using their DTD's:


   * us-patent-application-v41-2005-08-25.dtd
   * us-patent-application-v40-2004-12-02.dtd
   * us-sequence-listing-2004-03-09.dtd
   * pap-v16-2002-01-01.dtd
   * pap-v15-2001-01-31.dtd

Cheers,
   Neil Bacon
   Cambia

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: memory leak with DTD entity references?

Reply via email to