I have a DHTML editor embedded in a webpage. The output of the editor is handled by perl and written to text-files.
Sometimes text is pasted into the editor from a word document, and it contains unicode characters from sets such as Latin Extended Additional. (This is on a win2k machine.) Question: I have noticed that when special unicode characters are pasted into the editor, they are added to the textfile in a consistent, yet garbled fashion (e.g. "ö" or "î"). When the textfile is represented in HTML, they turn out fine, just as they should, but I'd still prefer to have proper html entity numbers in the file, e.g. ṥ The garbled characters always represent the same unicode characters, so I guess I could just have perl replace them with the proper html entity numbers before writing the text to a file. But I was wondering whether what's going on here - and I don't know exactly what that is: why do I get garbled characters in the first place? - can somehow be "repaired" using perl libraries or modules that convert ordinary text, or should I do something with "locale"? I realize that this is only partly a perl questions, and a confused one as that - so apologies if this query is misdirected or, er, not particularly intelligible :-) best regards, Birgit Kellner -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
