> Has anyone found a fluid cost-free process to achieve clean output from > pasting via Word docs? HTML Tidy is nice but doesn't really cut it as part > of a client process.
1. WVWare -> save Word file to HTML (this has to be installed on the server, btw) 2. str_replace() out empty content (it has a tendency to leave some empty paragraphs in there for no good purpose whatsoever; see example of that output below) 3. parse with html Tidy as XHTML, then do a manual XML parse over your content and rebuild it according to the rules you set yourself (i.e. only allow content to be put back into the rebuild-construction that you approve) Voilá. Super-duper clean output from Word :) Example of empty crap that WVWare leaves as residue: <p><div name="Default" align="left" style=" padding: 0.00mm 0.00mm 0.00mm 0.00mm; "> <p style="text-indent: 0.00mm; text-align: left; line-height: 4.166667mm; color: Black; background-color: White; "> </p></div> -- Faruk Ates Web consultant, designer, developer and project manager www.kurafire.net - www.mediadesign.nl - www.happyclog.com ********************************************************* The CMS discussion list for http://webstandardsgroup.org/ *********************************************************
