RE: CDATA Heuristics

Patrick Hochstenbach Mon, 09 May 2005 11:14:40 -0700

Hi Radu,

Thank you for the explanation of the heuristics in XMLBeans. We would be quite happy to switch this on/off on a per-document basis. This said, the ideal would be to be able to keep the CDATA the way our end-users typed them, but do I understand it correctly this isn't an option?

KR,
Patrick

On Fri, 6 May 2005, Radu Preotiuc-Pietro wrote:

The issue of CDATA and entitization has come up a lot of times.
XmlBeans is 100% infoset, but the XML infoset doesn't make any distinction as 
to how character data is represented. So the approach that it took was to 
decide on its own when should characters be entitized and when saved as a CDATA 
section. The algorithm is:
- if the length of the text is < 32 chars, entitization is used
- otherwise, if there are at least 5 '<' or '&' characters and they also 
account for at least 1% of the text length, CDATA is used.

For V2, we looked into making this configurable, since we got feedback on this 
mailing list th>
Here is one of the proposals:
- turn entitization on on a char by char basis via an XmlOption that basically says: 
"I want character x to always be entitized"
- turn CDATA on/off on a per-document basis


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: CDATA Heuristics

Reply via email to