Jon,
>I have some users who enter data into my web application through one of
>two ways:
>
>- copy/paste from microsoft word
>- XML export from InDesign UTF-16
>- XML export from Quark
>
>In all 3 of the cases I've described above, the orign software is
>putting through characters that do not display correctly on the web.
>
>The problem I'm having is that some of the characters such as an
>ellipsis mark or hyphen. When I run into these characters, they display
>as the wrong character... sometimes a question mark. Othertimes a square
>box... yet other times sequences of characters that are just totally
>crazy.
>
>My basic understanding of character encoding tells me that I want to
>reduce all of the characters down to ASCII. I do not know a good way to
>do this.
>
>How can I accept text from each of the above mentioned sources, perhaps
>others, and somehow *normalize* all of the character data into a set of
>characters that will display properly on my page every time?
One thing you'll need to do, is correctly set the page encoding. The
question marks ("?") you're seeing will happen when the browser posts data
CF and CF doesn't understand the character.
For example, if the character chr(160) (which is a "no break space") is
posted to my server and I don't explicitly defined the encoding for the
page, the chr(160) gets interpreted as question mark.
<cfset setEncoding("form", "iso-8859-1") />
Part of your problem may be solved by explicitly setting the encoding on
your form post pages.
-Dan
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
Introducing the Fusion Authority Quarterly Update. 80 pages of hard-hitting,
up-to-date ColdFusion information by your peers, delivered to your door four
times a year.
http://www.fusionauthority.com/quarterly
Archive:
http://www.houseoffusion.com/groups/CF-Talk/message.cfm/messageid:253309
Subscription: http://www.houseoffusion.com/groups/CF-Talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4