Todd Gruhn wrote: > I extracted the "text" from a large PDF using a NetBSD prog called > pdftotext(1).
pdftotext is really awesome. I find "pdftotext -layout" to do a truly excellent job with most PDF files I need to deal with from banks and things here. > I got the desired ASCII text, but it has many occurances of the sequence > \x{80}\x{9c} ... \x{80}\x{9d} Do you know what charset that is in natively? > Is there a nice and universal utility that can convert these to ASCII chars? > Someone mentioned EMACS... What about in pkgsrc? I'll be honest and say I did not look but on another system I am using "iconv" for this type of thing routinely. I will cross my fingers and hope it is available in pkgsrc. iconv -f UTF-8 -t ASCII//TRANSLIT <filein >fileout That's assuming UTF-8 in and ASCII out but you will probably want some other code set like this or another code page. iconv -f CP1252 -t UTF-8 <filein >fileout Hopefully even if incomplete it might still be useful. Bob