On Fri, 10 Jan 2003 15:27, Damon Lynch wrote:
> Hi,
>
> I need to convert a lot of text sent in e-mails and MS Word documents
> into plain text format, to be fed into a python script and then
> e-mailed.  I want the final product to be plain ASCII text i.e. no fancy
> em hyphens, curly quotes and so forth.
>
> One big problem currently is that when I copy-n-paste characters like
> curly quotes or em hyphens from OpenOffice.org into gedit or kate, the
> characters show up obviously incorrect.  e.g. a capital A with a bar on
> top.  When looking at them in python strings, these are some examples:
> \xe2\x80\x99 (single quote)
> xe2\x80[\x9c\x9d] (RE of opening and closing double quote)
> \x93 (another curly quote)
>
> Is there a utility program in Linux to convert these characters?  Or is
> there a library in Python that will do it for me (instead of me using
> RE's to substitute them)?
>
> Many thanks,
> Damon

Could be something like demoroniser you are looking for.
http://www.fourmilab.ch/webtools/demoroniser/

This is designed specifically for HTML pages. But if you have any experience 
in perl you could hack it to work on the text saves of openoffice.

-- 
Michael

Want to buy your Pack or Services from MandrakeSoft? 
Go to http://www.mandrakestore.com

Reply via email to