>>What do you folks have experience with that might be worth looking at?
It was quite a while ago, but I found there was too many flaws the way
FCK was designed.
1. FCK only had Word cleanup as an option. If the user doen't use it,
you will get all the garbage anyway.
2. all other ways to import text (ie: cut'n paste, drag'n drop,
right-click and paste...) would bypass the cleaner.
So I designed my own editor in which MSWord cleanup is mandatory.
Like in FCKeditor, I use Regexp in the process, but with many more
expressions and more cleaning.
Here is the first par of the cleaning function (the whole thing will be
cut by the CF_talk system)
function cleanWord (html)
// cleans pasted text from Word
{
//alert(html)
html = html.replace(/<o:p>\s*<\/o:p>/g, "") ;
html = html.replace(/<o:p>.*?<\/o:p>/g, "") ;
// Remove mso-xxx styles.
html = html.replace( /\s*mso-[^:]+:[^;"]+;?/gi, "" ) ;
// Remove margin styles.
html = html.replace( /\s*MARGIN: 0cm 0cm 0pt\s*;/gi, "" ) ;
html = html.replace( /\s*MARGIN: 0cm 0cm 0pt\s*"/gi, "\"" ) ;
html = html.replace( /\s*TEXT-INDENT: 0cm\s*;/gi, "" ) ;
html = html.replace( /\s*TEXT-INDENT: 0cm\s*"/gi, "\"" ) ;
html = html.replace( /\s*TEXT-ALIGN: [^\s;]+;?"/gi, "\"" ) ;
html = html.replace( /\s*PAGE-BREAK-BEFORE: [^\s;]+;?"/gi, "\"" ) ;
html = html.replace( /\s*FONT-VARIANT: [^\s;]+;?"/gi, "\"" ) ;
html = html.replace( /\s*tab-stops:[^;"]*;?/gi, "" ) ;
html = html.replace( /\s*tab-stops:[^"]*/gi, "" ) ;
html = html.replace( /\s*FONT-FAMILY:[^;"]*;?/gi, "" ) ;
// Remove Class attributes
html = html.replace(/<(\w[^>]*)\s*class=([^ |>]*)([^>]*)/gi, "<$1$3") ;
// Remove styles.
html = html.replace( /<(\w[^>]*)style="([^\"]*)"([^>]*)/gi, "<$1$3" ) ;
// Remove empty styles.
html = html.replace( /\s*style="\s*"/gi, '' ) ;
html = html.replace( /<SPAN[^>]*>\s* \s*<\/SPAN>/gi, ' ' ) ;
html = html.replace( /<SPAN[^>]*>\s*<\/SPAN>/gi, '' ) ;
// Remove Lang attributes
html = html.replace(/<(\w[^>]*) lang=([^ |>]*)([^>]*)/gi, "<$1$3") ;
html = html.replace( /<SPAN\s*>([\s\S]*?)<\/SPAN>/gi, '$1' ) ;
html = html.replace( /<SPAN\s*>([\s\S]*?)<\/SPAN>/gi, '$1' ) ;
html = html.replace( /<SPAN\s*>([\s\S]*?)<\/SPAN>/gi, '$1' ) ;
// remove all font tags
html = html.replace( /<\/?FONT[^>]*>/gi, '' ) ;
html = html.replace( /<\/?FONT[^>]*>/gi, '' ) ;
html = html.replace( /<\/?FONT[^>]*>/gi, '' ) ;
html = html.replace( /<\/?DIV([^>]*)>/gi, '' ) ;
// Remove XML elements and declarations
html = html.replace(/<\\?\?xml[^>]*>/gi, "") ;
// Remove Tags with XML namespace declarations: <o:p></o:p>
html = html.replace(/<\/?\w+:[^>]*>/gi, "") ;
html = html.replace( /<H\d>\s*<\/H\d>/gi, '' ) ;
....
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
Adobe® ColdFusion® 8 software 8 is the most important and dramatic release to
date
Get the Free Trial
http://ad.doubleclick.net/clk;207172674;29440083;f
Archive:
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:317093
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe:
http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4