Heinz, Chris wrote:
Hey, I’m a noob here, so if anyone wants to point me to the archives of this mailing list to search for my problem, that’s fine.

My problem is that I have three special characters being placed into formatted text: return, non-breaking spaces, and soft hyphens. I can input them as 
,  , and &#xAD. The first two Xerces handles fine, the third I seem to be getting a standard hyphen???
Have you examined the content of the document to verify this? I don't know of any code in Xerces-C that would translate a soft hyphen to a regular hyphen.

But when I write them out, they go in as non-printing control characters. Xerces can import those fine, so I can round trip, but, the non-printing characters aren’t too user-friendly.
I'm not sure I understand your question and the problems you're seeing. Are you trying to configure the serializer so it generates entities for certain characters? If so, there's no way to do that.


I have defined in my dtd file:

<!ENTITY return "&#x0D;">
<!ENTITY nbsp "&#xA0;">
<!ENTITY softhyphen "&#xAD;">
In general, the DTD is processed by the parser, the entities are expanded, and their identities are lost. There is no connection between the DTD in the input document, and the document the serializer generates.


And tried &return;, etc, that didn’t seem to work at all.
Didn't seem to work in what way?

I’ve checked DomOptions and looked at DOMSerializer, haven’t seen anything that looks like it would help.
The usual way to handle this is to specify US-ASCII as the encoding. Since that encoding only supports characters below 128, all other characters will be written as numeric character references.

However, that will not solve the problem with the U+000D, which should already be written as a numeric character reference. If that's not the case, the Xerces serializer has a bug.

Dave

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to