RE: CDATA Heuristics

Xiaoming Liu Tue, 10 May 2005 14:21:06 -0700

A flag to turn on/off CDATA on a per-document basis is much helpful in use
case here, but I wonder if a further step can be taken.


In a typical application here, we need to process XML documents delivered
by various parties, as such we don't have control over how CDATA are used in
these thiry-party documents. Now we want to put these XML documents
together with XMLBeans, for provenance and fixity reasons, we really want
to keep original CDATA intact in the composite document. However, right
now we don't have control.

It would be really good if XMLBeans can observe CDATA section when
load/save an XML document.


thanks,
xiaoming



> Date: Fri, 6 May 2005 13:55:06 -0700
> From: Radu Preotiuc-Pietro <[EMAIL PROTECTED]>
> Reply-To: [email protected]
> To: [email protected]
> Subject: RE: CDATA Heuristics
>
> The issue of CDATA and entitization has come up a lot of times.
> XmlBeans is 100% infoset, but the XML infoset doesn't make any distinction as 
> to how character data is represented. So the approach that it took was to 
> decide on its own when should characters be entitized and when saved as a 
> CDATA section. The algorithm is:
> - if the length of the text is < 32 chars, entitization is used
> - otherwise, if there are at least 5 '<' or '&' characters and they also 
> account for at least 1% of the text length, CDATA is used.
>
> For V2, we looked into making this configurable, since we got feedback on 
> this mailing list that it would be useful, but never got around to doing it.
>
> Here is one of the proposals:
> - turn entitization on on a char by char basis via an XmlOption that 
> basically says: "I want character x to always be entitized"
> - turn CDATA on/off on a per-document basis
>
> What do people think?
> Thanks,
> Radu
>
> -----Original Message-----
> From: Patrick Hochstenbach [mailto:[EMAIL PROTECTED]
> Sent: Thursday, May 05, 2005 11:27 PM
> To: [email protected]
> Subject: CDATA Heuristics
>
>
>
> Hi,
>
> in our library we are very interested using XMLBeans in a document
> archiving project which stores XML files in a database. The excellent
> round-tripping characteristics of XMLBeans are crucial in
> our project. But, with the serialization of text containing
> escaped '<'-s and '&'-s we're at a loss. XMLBeans seems to
> have some heuristics to decide when text containing these
> characters should be saved as CDATA and when not.
>
> Is it possible to decide at runtime when text should be saved in
> CDATA sections and when not? Or better, can in some way CDATA
> sections be preserved?
>
> Best regards,
>
> Patrick
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: CDATA Heuristics

Reply via email to