On Sep 3, 2008, at 18:35, Steffanina, Jeff wrote:

Hi Jeff

There is always one MORE option to consider!!

What would you suggest as the best way to handle this?

I think I'd opt for using (N)umeric (C)haracter (R)eferences. Reasoning would be that if one changes the BASIC code to emit the sequence 'è', this will never, ever have to be changed (unless Unicode would somehow decide on altering the codepoints). You can change the encoding in the XML header all you want, NCRs will always work.

On the other hand, if you have a LOT of those characters, using NCRs could make your XML a bit bulky (instead of 1 byte/character, you actually generate 6-8 bytes to represent one character in the final result; the XML parser, instead of needing only one byte, has to parse all bytes from '&' up to and including ';'). The character code you mentioned earlier (130) is the decimal value for 'é' in ASCII, so if you're concerned with the size of the XML and do not want to generate 6 bytes for one character, try specifying "US- ASCII" as encoding for the source XML.


HTH!

Andreas
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to