Thomas, "Thomas Schleu" <[email protected]> wrote on 01/27/2010 07:47:09 AM:
> Michael, > > I know that the body text comes in pieces. That's why I check that the > accumulated text buffer (sb) is empty when looking at the start of the > characters. The code you posted is assuming that the beginning of the first chunk will start with "abc". There is no such guarantee. The text can be split anywhere and when I ran your program I observed that for one of the elements "abc" crosses a buffer boundary so on the first callback you only get the first two characters: "ab". Your code needs to account for this. I see no issue with Xerces. > I also only check when I am inside the "item" element. > The XML is very simple. It just repeats the same element over and over > again. > As I mentioned before the error comes when the XML total size exceeds 16kB > and occurs when parsing the XML element that is behind the first 8kB. > I looked at the parser source shortly and noticed that it uses an internal > buffer of 8kB. That's why I assume the problem occurs when re-filling the > buffer while in the middle of or after processing a character entity > "". I'm not sure what source you're looking at. Xerces' default buffer size is 2 KB. It's been that size for a long time. Are you sure you're actually using Apache Xerces and not some derivative like what Sun ships in their JDK? > Once I removed all those character entities the parser worked as expected. > > Any help you can give? > Thomas Schleu > Chief Technology Officer > > Mail: mailto:[email protected] > Fon: +49-30-390 485 0 > Fax: +49-30-390 485 55 > > Canto GmbH > Alt-Moabit 73 > D-10555 Berlin > Germany > http://www.canto.com > Amtsgericht Berlin-Charlottenburg HRB 88566 > Geschäftsführer: Hans-Dieter Schädel Thanks. Michael Glavassevich XML Parser Development IBM Toronto Lab E-mail: [email protected] E-mail: [email protected]
