> Has anyone found a fluid cost-free process to achieve clean output from
> pasting via Word docs? HTML Tidy is nice but doesn't really cut it as part
> of a client process.

1. WVWare -> save Word file to HTML (this has to be installed on the
server, btw)
2. str_replace() out empty content (it has a tendency to leave some empty
paragraphs in there for no good purpose whatsoever; see example of that
output below)
3. parse with html Tidy as XHTML, then do a manual XML parse over your
content and rebuild it according to the rules you set yourself (i.e. only
allow content to be put back into the rebuild-construction that you
approve)

Voilá. Super-duper clean output from Word :)

Example of empty crap that WVWare leaves as residue:


<p><div name="Default" align="left" style="  padding: 0.00mm 0.00mm 0.00mm
0.00mm; ">

<p style="text-indent: 0.00mm; text-align: left; line-height: 4.166667mm;
color: Black; background-color: White; ">

</p></div>


-- 
Faruk Ates
Web consultant, designer, developer and project manager
www.kurafire.net - www.mediadesign.nl - www.happyclog.com
*********************************************************
The CMS discussion list for http://webstandardsgroup.org/
*********************************************************

Reply via email to