We use the built in richtext editor in cf8+ to handle pasting from Word, and it seems to handle it very well. Might be an option. Just 'Paste from Word' or 'Paste as plain text'
Will -----Original Message----- From: Kevin Pepperman [mailto:[email protected]] Sent: 25 February 2010 08:58 To: cf-talk Subject: Cleaning Microsoft .docx special chars I just wondered if anyone else has seen this before, and I wanted to post this to help anyone else that has this same issue. We have an outsourced writer doing textual content for products in a site that I work on. The writer always supplies me with the content and I added it via a CMS. It was always in Microsoft Word format, so I know to strip out special chars using the DeMoronize UDF found at CFLIB, the function is built into the CMS. But recently he has been using a newer version of Word and giving me .docx files. At first I didn't really notice, but recently some of the content has been indexed by Google and since the text was used in Meta descriptions and the page titles in places, I started seeing new &#xxx; chars that would not render in Google that I didn't see before. When rendered correct on our site, they look exactly like the "smart quotes" and several other strange things like "..." etc.. The DeMoronize() UDF used to remove these with the chr(n) function, but these look completely different and were missed by the UDF. These are the new entities that I am seeing, they show up fine in UTF-8 HTML, but posting them to the CMS was adding them like this to MySql, and Google would not render them in their results correct. “ ” ’   ' … ™ The fix for DeMoronize() is simple. Add these lines. text = ReReplace(text, "&##8220;", """", "All"); text = ReReplace(text, "&##8221;", """", "All"); text = ReReplace(text, "&##8217;", "'", "All"); text = ReReplace(text, "&##160;", " ", "All"); text = ReReplace(text, "&##39;", "'", "All"); text = ReReplace(text, "&##8230;", "...", "All"); text = ReReplace(text, "&##8482;", "™", "All"); Server notes: The mySql tables, columns, connections and collation are all utf-8_unicode_ci, Application.cfc contains SetEncoding("form","utf-8"); SetEncoding("url","utf-8"); and I am using cfprocessingdirective pageencoding="UTF-8" Has anyone else seen this? -- /Kevin Pepperman ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~| Want to reach the ColdFusion community with something they want? Let them know on the House of Fusion mailing lists Archive: http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:331124 Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4

