I have a number of web forms that submit large blocks of text. Often the people who fill in these forms compose their text in a word processor and then copy and paste it into the web forms.
Sometimes the text submitted via these forms contains Unicode character entities, as in the following sample:
immunopathogenesis may be triggerred through Fas-, TNF-- or TGF--derived mechanisms
In this case, the string '' represents the Greek letter alpha, and '' represents beta.
My problem is that users of the data submitted via the web want these entities translated to something they can understand, but these particular entity values come from the "private use area" of the Unicode character set, so as far as I know they can't be reliably translated.
I suspect this problem starts on a Windows system, in which the Greek alpha or beta are displayed with the correct glyph on the user's screen, but when the text is pasted into the text box in the browser, this conversion happens. That's my theory, anyway.
Does anybody else recognize this phenomenon? If so, do you have a way to translate character entities that are not defined by Unicode? If Microsoft is to blame, as I suspect, do they happen to publish somewhere a guide to their character entities?
Any advice would be most welcome.
Michael R. Boudreau Senior Electronic Publishing Developer The University of Chicago Press 1427 E. 60th Street Chicago, IL 60637 773-753-3298
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]