Indeed, this is not an option; a SAX parser will not report this information, so XMLBeans doesn't even know in the general case how the literal text in the input document looked like.
Radu -----Original Message----- From: Patrick Hochstenbach [mailto:[EMAIL PROTECTED] Sent: Monday, May 09, 2005 11:15 AM To: [EMAIL PROTECTED] Subject: RE: CDATA Heuristics Hi Radu, Thank you for the explanation of the heuristics in XMLBeans. We would be quite happy to switch this on/off on a per-document basis. This said, the ideal would be to be able to keep the CDATA the way our end-users typed them, but do I understand it correctly this isn't an option? KR, Patrick On Fri, 6 May 2005, Radu Preotiuc-Pietro wrote: > The issue of CDATA and entitization has come up a lot of times. > XmlBeans is 100% infoset, but the XML infoset doesn't make any distinction as > to how character data is represented. So the approach that it took was to > decide on its own when should characters be entitized and when saved as a > CDATA section. The algorithm is: > - if the length of the text is < 32 chars, entitization is used > - otherwise, if there are at least 5 '<' or '&' characters and they also > account for at least 1% of the text length, CDATA is used. > > For V2, we looked into making this configurable, since we got feedback on > this mailing list th> > Here is one of the proposals: > - turn entitization on on a char by char basis via an XmlOption that > basically says: "I want character x to always be entitized" > - turn CDATA on/off on a per-document basis --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

