I have a very serious issue, that might affect my project as a whole.
My Project involves conversion of an Input XML file to an output XML file format, which is predefined.
This process is done using JAVA (JDK 1.3.1) with JAXP 1.1 and Crimson Parser for Parsing the input file. We
use the SAX Parser of the Crimson for implementation.
My files range from 4 - 40 MB. When I try to parse a file more than 1 MB, I find that, the parser does not read part of characters at some fixed places. It is happening at the same place. It is sure that the input file has that data in the
correct format. It happens only with the data and not on the tags. I mean that it is working fine for start element
and end element. It is not working for Characters alone.
1. You need to upgrade to Xerces (but this will not fix your problem).
2. See http://www.cafeconleche.org/books/xmljava/chapters/ch06s07.html (This likely will fix your problem)
In brief, when there's a large amount of text between two tags with no intervening markup, the parser may choose to call characters() multiple times even though it doesn't need to. Xerces generally won't pass more than 16K of text in one call. Crimson is limited to about 8K of text per call. At the extreme, I have even seen a parser pass a single character at a time to the characters() method. You must not assume that the parser will pass you the maximum contiguous run of text in a single call to characters().
--
Elliotte Rusty Harold
[EMAIL PROTECTED]
Effective XML (Addison-Wesley, 2003)
http://www.cafeconleche.org/books/effectivexml http://www.amazon.com/exec/obidos/ISBN%3D0321150406/ref%3Dnosim/cafeaulaitA
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]