Re: converting html with \xa9 to Markdown and using iconv?

Jeremy C. Reed Thu, 22 Mar 2007 13:31:21 -0800

> > ?   \xa9  (Copyright symbol)

> As far as I understand you, you are looking for a converter which supports 
> UTF-8 / Unicode characters?


Maybe. But now that I think about it more I'd prefer some got converted to 
the HTML entity, like &copy;

I found perl module HTML::Entities but I can't get its encode_entities() 
to do what I want.

I may give your script a try, but I didn't have PHP on my workstation. (I 
did have it on the system I am migrating the data too, so I guess I could 
do all my work there instead.)

My original documents are in XML and have the &#xE9; like entities, but 
the generated HTML just has the single character which is breaking 
html2text.py. I thought I could convert the characters back with perl with 
perl -pe "s/([\x80-\xff])/'&#' . ord($1) . ';'/eg;" But that failed from 
command line. It worked in a perl script though.

But then I see that html2text.py does convert the encoding to literal text 
like "(C)" for &#169;. I don't want that either. I will have to try your 
tool next. Thanks.

  Jeremy C. Reed

_______________________________________________
Markdown-Discuss mailing list
[email protected]
http://six.pairlist.net/mailman/listinfo/markdown-discuss

Re: converting html with \xa9 to Markdown and using iconv?

Reply via email to