Re: [9fans] More 'Sam I am'

Bruce Ellis Fri, 10 Feb 2006 04:59:38 -0800

yuk


On 2/10/06, Aharon Robbins <[EMAIL PROTECTED]> wrote:
> In article <[EMAIL PROTECTED]> you write:
> >On Wed, Feb 08, 2006 at 10:14:53AM -0800, Lyndon Nerenberg wrote:
> >> The problem with this is the data I want is interspersed with data that
> >> I don't want.  And the bits I don't want are variable length
> >> inconsistent multi-line text that is a bitch to filter out of the
> >> rendered output stream.  It turns out that sam (against the raw HTML)
> >> was the only tool that was able to do the job.  I just wish I could wrap
> >> it in a shell script that I could throw at the directory containing all
> >> the .html files.
> >
> >I'm not talking about rendering, just parsing.  Well, ultimately,
> >what's important is that you get what you need out of the solution, I
> >guess.  Still, regular expressions alone give you part of the story,
> >but not the whole thing.  I submit that the power to actually parse
> >the tokens in the data as opposed to just matching them (even if the
> >regular expression language you're using is powerful enough to match
> >the structure of the document) is more powerful.  But hey, if sam
> >floats your boat, fish on that river!
> >
> >       - Dan C.
>
> Possibly of interest is the xmlgawk project:
>
>        http://www.sourceforge.net/projects/xmlgawk
>
> This is an extended version of GNU Awk with an XML parser module add-on.
> The idea that instead of reading lines, you get XML tokens (tags, fields
> in the tags, and marked-up data).  I am not directly involved in it, but
> it looks like a rather promising alternative for people who would like
> to process XML type data in the more traditional Unixy fashion.
>
> Arnold
> --
> Aharon (Arnold) Robbins --- Pioneer Consulting Ltd.     arnold AT skeeve DOT 
> com
> P.O. Box 354            Home Phone: +972  8 979-0381    Fax: +1 206 350 8765
> Nof Ayalon              Cell Phone: +972 50  729-7545
> D.N. Shimshon 99785     ISRAEL
>

Re: [9fans] More 'Sam I am'

Reply via email to