I see...I assumed the entity reference   was meant to be read by the browser in the xhtml output, not the internal xslt processor.
I'll look into Saxon, but for now I think I'm going to have to customize en.xml to just use spaces instead of entity references. If I *did* want to use a reference for the browser only, would &#160; work? xhtml output =>   On 10/31/07, Bob Stayton <[EMAIL PROTECTED]> wrote: > ----- Original Message ----- > From: "Anthony Ettinger" <[EMAIL PROTECTED]> > To: "Bob Stayton" <[EMAIL PROTECTED]> > Cc: "Dave Pawson" <[EMAIL PROTECTED]>; <[email protected]> > Sent: Wednesday, October 31, 2007 1:09 PM > Subject: Re: [docbook] invalid characters for ISO-8859-1 response > > > > > > Sure, unicode makes sense...I could be missing something but I > > would've left entity references alone...I still don't see what is > > gained by converting Œ vs. just leaving it as Œ in the > > output...or simply leaving it as a space. > > > Ah, now I think I see what you are getting at. If you type   for a > non-breaking space, why doesn't it preserve that character as the string > " " in the output? The answer is that the input representation has no > direct connection to the output representation. > > When an input XML document is parsed into memory, all characters are > converted to Unicode internally, regardless of their initial > representation. There is no record in the loaded memory that the input was > " ", it is all Unicode in memory. After processing in memory, the XML > is output using a serializer whose job is to convert the Unicode strings > into an output string in some encoding. An encoding has to be chosen, and > it is not selected based on the input encoding (which is no longer known to > the processor). The default output encoding is UTF-8, but you can specify > any of several different encodings for the serializer to use. > > That said, one option you might look at is using Saxon instead of libxml2, > and use a Saxon extension to control how characters are represented in the > output. After all, even if your output encoding is UTF-8, you could still > output the six-character string " " for a non-breaking space instead > of the UTF-8 single hex character, and it would still be interpreted as a > non-breaking space. Saxon provides that choice. See: > > http://www.sagehill.net/docbookxsl/OutputEncoding.html#SaxonCharacter > > Bob Stayton > Sagehill Enterprises > DocBook Consulting > [EMAIL PROTECTED] > > > > -- Anthony Ettinger Ph: 408-656-2473 var (bonita, farley) = new Dog; farley.barks("very loud"); bonita.barks("at strangers"); http://chovy.dyndns.org/resume/ http://utuxia.com/consulting --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
