Unicode character entities in form data

Michael Boudreau Fri, 07 May 2004 12:16:05 -0700

This is not strictly an embperl question, but it concerns a web site that I'm running with Embperl, and I hope that other web gurus out there might have some suggestions.

I have a number of web forms that submit large blocks of text. Often the people who fill in these forms compose their text in a word processor and then copy and paste it into the web forms.

Sometimes the text submitted via these forms contains Unicode character entities, as in the following sample:

   immunopathogenesis may be triggerred through Fas-, TNF-&#61537;- or
   TGF-&#61538;-derived mechanisms

In this case, the string '' represents the Greek letter alpha, and '' represents beta.

My problem is that users of the data submitted via the web want these entities translated to something they can understand, but these particular entity values come from the "private use area" of the Unicode character set, so as far as I know they can't be reliably translated.

I suspect this problem starts on a Windows system, in which the Greek alpha or beta are displayed with the correct glyph on the user's screen, but when the text is pasted into the text box in the browser, this conversion happens. That's my theory, anyway.

Does anybody else recognize this phenomenon? If so, do you have a way to translate character entities that are not defined by Unicode? If Microsoft is to blame, as I suspect, do they happen to publish somewhere a guide to their character entities?

Any advice would be most welcome.


Michael R. Boudreau
Senior Electronic Publishing Developer
The University of Chicago Press
1427 E. 60th Street
Chicago, IL 60637
773-753-3298


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Unicode character entities in form data

Reply via email to