I have a DHTML editor embedded in a webpage. The output of the editor is 
handled by perl and written to text-files.

Sometimes text is pasted into the editor from a word document, and it 
contains unicode characters from sets such as Latin Extended Additional. 
(This is on a win2k machine.)

Question: I have noticed that when special unicode characters are pasted 
into the editor, they are added to the textfile in a consistent, yet 
garbled fashion (e.g. "ö" or "î"). When the textfile is represented in 
HTML, they turn out fine, just as they should, but I'd still prefer to have 
proper html entity numbers in the file, e.g. ṥ

The garbled characters always represent the same unicode characters, so I 
guess I could just have perl replace them with the proper html entity 
numbers before writing the text to a file.

But I was wondering whether what's going on here - and I don't know exactly 
what that is: why do I get garbled characters in the first place? - can 
somehow be "repaired" using perl libraries or modules that convert ordinary 
text, or should I do something with "locale"?

I realize that this is only partly a perl questions, and a confused one as 
that - so apologies if this query is misdirected or, er, not particularly 
intelligible :-)

best regards,

Birgit Kellner



--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to