Hi Neil, There was a related discussion [1][2] about the SymbolTable on this list back in March 2005. Do these large documents contain similar names or do they contain many unique names. Specifically do your documents look like this?
Doc 1: <doc><elem1/> <elem2/> . . . <elem99999/> <elem100000/></doc> ... Doc n: <doc><elem1-n/> <elem2-n/> . . . <elem99999-n/> <elem100000-n/></doc> If they do that would explain why you're running out of memory. The SymbolTable will create an entry for each unique name. The last time this came up I proposed a workaround [3] (which trades off memory usage at the expense of speed). Thanks. [1] http://marc.theaimsgroup.com/?t=111099151200003&r=1&w=2 [2] http://marc.theaimsgroup.com/?t=111099151200003&r=2&w=2 [3] http://marc.theaimsgroup.com/?l=xerces-j-dev&m=111103024915201&w=2 Michael Glavassevich XML Parser Development IBM Toronto Lab E-mail: [EMAIL PROTECTED] E-mail: [EMAIL PROTECTED] Neil Bacon <[EMAIL PROTECTED]> wrote on 08/25/2006 02:55:10 AM: > Hi, > I've been running out of memory reusing the same XMLReader > (xercesImpl-2.8.0) to parse many large documents. > The documents reference the same DTD which references many entities. > Profiling (with netbeans-5.0) reveals that the problem is with char[]s > allocated by: > > org.apache.xerces.util.SymbolTable.$Entry.<init> > org.apache.xerces.util.SymbolTable.addSymbol() > org.apache.xerces.impl.XMLEntityScanner.scanName() > org.apache.xerces.impl.XMLDTDscannerImpl.scanEntityDecl() > ... > Maybe its storing the symbol table for the same DTD for each new > document and never discarding it? > Should it recognize a previously parsed DTD and reuse the existing > symbol table? > > I've worked around it by using a new XMLReader for each document. > > Can I get DTDs and entities cached to improve performance? > I'm using org.apache.xerces.util.XMLCatalogResolver. > > Cheers, > Neil. > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
