Re: [9fans] More 'Sam I am'

Lyndon Nerenberg Wed, 08 Feb 2006 10:15:15 -0800

Hmm.  I'm going to make an unpopular but pragmatic suggestion: Don't use
sed or sam, but instead, use a language with an HTML parser available.
There are some jobs for which regular expressions aren't the best tool;
I personally think this is one of them.  Here's a script I posted to
USENET years ago to extract data from a table.

The problem with this is the data I want is interspersed with data thatI don't want. And the bits I don't want are variable lengthinconsistent multi-line text that is a bitch to filter out of therendered output stream. It turns out that sam (against the raw HTML)was the only tool that was able to do the job. I just wish I could wrapit in a shell script that I could throw at the directory containing allthe .html files.


--lyndon

Re: [9fans] More 'Sam I am'

Reply via email to