thanks for the idea, Ignatios. I will try this.
On Sat, Jan 16, 2021 at 3:00 PM <[email protected]> wrote: > > Hi, > > On Sat, Jan 16, 2021 at 01:45:45PM -0500, Todd Gruhn wrote: > > I have a large document (18,000L). It is full of tags such as <93> > > ,<94> , <95> . > > > > If I view the doc in a PERL editor I see \x{93} , \x{94} , \{95} ... > > Ahem - are you sure (have you looked at as few of them with hexdump -C)? > > Your perl editor displays \x{93}, your other editor <93>, in reality > they might be just one octet with that value. > Sounds like some windows-1252, where they're “, ” and • , respectively. > > > Is there a pkg or command to strip these tags and leave the text ? > > In that case I'd try > > iconv -f windows-1252 -t utf-8 < foo > bar > > Regards, > -is
