RE: CDATA Heuristics

Radu Preotiuc-Pietro Mon, 09 May 2005 12:38:07 -0700

Indeed, this is not an option; a SAX parser will not report this information, 
so XMLBeans doesn't even know in the general case how the literal text in the 
input document looked like.


Radu

-----Original Message-----
From: Patrick Hochstenbach [mailto:[EMAIL PROTECTED]
Sent: Monday, May 09, 2005 11:15 AM
To: [EMAIL PROTECTED]
Subject: RE: CDATA Heuristics


Hi Radu,

Thank you for the explanation of the heuristics in XMLBeans. We would be
quite happy to switch this on/off on a per-document basis. This said,
the ideal would be to be able to keep the CDATA the way our end-users 
typed them, but do I understand it correctly this isn't an option?

KR,
Patrick

On Fri, 6 May 2005, Radu Preotiuc-Pietro wrote:

> The issue of CDATA and entitization has come up a lot of times.
> XmlBeans is 100% infoset, but the XML infoset doesn't make any distinction as 
> to how character data is represented. So the approach that it took was to 
> decide on its own when should characters be entitized and when saved as a 
> CDATA section. The algorithm is:
> - if the length of the text is < 32 chars, entitization is used
> - otherwise, if there are at least 5 '<' or '&' characters and they also 
> account for at least 1% of the text length, CDATA is used.
>
> For V2, we looked into making this configurable, since we got feedback on 
> this mailing list th>
> Here is one of the proposals:
> - turn entitization on on a char by char basis via an XmlOption that 
> basically says: "I want character x to always be entitized"
> - turn CDATA on/off on a per-document basis

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: CDATA Heuristics

Reply via email to