yuk
On 2/10/06, Aharon Robbins <[EMAIL PROTECTED]> wrote: > In article <[EMAIL PROTECTED]> you write: > >On Wed, Feb 08, 2006 at 10:14:53AM -0800, Lyndon Nerenberg wrote: > >> The problem with this is the data I want is interspersed with data that > >> I don't want. And the bits I don't want are variable length > >> inconsistent multi-line text that is a bitch to filter out of the > >> rendered output stream. It turns out that sam (against the raw HTML) > >> was the only tool that was able to do the job. I just wish I could wrap > >> it in a shell script that I could throw at the directory containing all > >> the .html files. > > > >I'm not talking about rendering, just parsing. Well, ultimately, > >what's important is that you get what you need out of the solution, I > >guess. Still, regular expressions alone give you part of the story, > >but not the whole thing. I submit that the power to actually parse > >the tokens in the data as opposed to just matching them (even if the > >regular expression language you're using is powerful enough to match > >the structure of the document) is more powerful. But hey, if sam > >floats your boat, fish on that river! > > > > - Dan C. > > Possibly of interest is the xmlgawk project: > > http://www.sourceforge.net/projects/xmlgawk > > This is an extended version of GNU Awk with an XML parser module add-on. > The idea that instead of reading lines, you get XML tokens (tags, fields > in the tags, and marked-up data). I am not directly involved in it, but > it looks like a rather promising alternative for people who would like > to process XML type data in the more traditional Unixy fashion. > > Arnold > -- > Aharon (Arnold) Robbins --- Pioneer Consulting Ltd. arnold AT skeeve DOT > com > P.O. Box 354 Home Phone: +972 8 979-0381 Fax: +1 206 350 8765 > Nof Ayalon Cell Phone: +972 50 729-7545 > D.N. Shimshon 99785 ISRAEL >
