Am Donnerstag, 22. März 2007 schrieb Jeremy C. Reed: > The html document various characters like > \xa0 > © \xa9 (Copyright symbol) > (and others). > > I tried using html2text.py but it didn't like these characters. > > Any ideas on how I can use iconv or another tool to convert documents like > this so I can then convert to Markdown? > > I don't want to do manually as I have around 500+ documents. > > > Jeremy C. Reed
As far as I understand you, you are looking for a converter which supports UTF-8 / Unicode characters? My PHP-script (ported from html2text.py) doesn't change those, so it would theoretically work. Try it out at [1]. But: It's PHP - so unless you have access to a command line or write a little PHP script to be run locally it will be of no use for you. The latter should be pretty easy though, simply recourse through your files / folders, apply html2text to all and save the output somewhere. You might want to allow long(er) execution times for PHP scripts for the meantime. Another alternative would be to use one of the other converters, I know there are some but I don't have their URLs at hand. Maybe someone will be able to help you. [1]: http://milianw.de/projects/html2text/ -- Milian Wolff http://milianw.de _______________________________________________ Markdown-Discuss mailing list [email protected] http://six.pairlist.net/mailman/listinfo/markdown-discuss
