Hmm. I'm going to make an unpopular but pragmatic suggestion: Don't use sed or sam, but instead, use a language with an HTML parser available. There are some jobs for which regular expressions aren't the best tool; I personally think this is one of them. Here's a script I posted to USENET years ago to extract data from a table.
The problem with this is the data I want is interspersed with data that I don't want. And the bits I don't want are variable length inconsistent multi-line text that is a bitch to filter out of the rendered output stream. It turns out that sam (against the raw HTML) was the only tool that was able to do the job. I just wish I could wrap it in a shell script that I could throw at the directory containing all the .html files.
--lyndon
