I just wondered if anyone else has seen this before, and I wanted to post
this to help anyone else that has this same issue.

We have an outsourced writer doing textual content for products in a site
that I work on.

The writer always supplies me with the content and I added it via a CMS.

It was always in Microsoft Word format, so I know to strip out special chars
using the DeMoronize UDF found at CFLIB, the function is built into the
CMS.

But recently he has been using a newer version of Word and giving me .docx
 files.

At first I didn't really notice, but recently some of the content has been
indexed by Google and since the text was used in Meta descriptions and the
page titles in places, I started seeing new &#xxx; chars that would not
render in Google that I didn't see before.

When rendered correct on our site, they look exactly like the "smart quotes"
and several other strange things like "..." etc..

The DeMoronize() UDF used to remove these with the chr(n) function, but
these look completely different and were missed by the UDF.

These are the new entities that I am seeing, they show up fine in UTF-8
HTML, but posting them to the CMS was adding them like this to MySql, and
Google would not render them in their results correct.

“ ” ’   ' … ™

The fix for DeMoronize() is simple.

Add these lines.

text = ReReplace(text, "&##8220;", """", "All");
 text = ReReplace(text, "&##8221;", """", "All");
text = ReReplace(text, "&##8217;", "'", "All");
 text = ReReplace(text, "&##160;", " ", "All");
text = ReReplace(text, "&##39;", "'", "All");
 text = ReReplace(text, "&##8230;", "...", "All");
text = ReReplace(text, "&##8482;", "™", "All");


Server notes:

The mySql tables, columns, connections and collation are all
utf-8_unicode_ci,

Application.cfc contains SetEncoding("form","utf-8");
SetEncoding("url","utf-8"); and I am using cfprocessingdirective
pageencoding="UTF-8"

Has anyone else seen this?

-- 
/Kevin Pepperman


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
Want to reach the ColdFusion community with something they want? Let them know 
on the House of Fusion mailing lists
Archive: 
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:331119
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4

Reply via email to