> I would like to be able to write scripts like this:
> 
>       load "http://apc-reset/outlets.htm";
>       find "yoshimi"
>       nearest option, set "Immediate Reboot"
>       submit
> 
> or like this:
> 
>       load "http://www.fedex.com/Tracking";
>       find form
>       enter "792544024753"
>       submit
>       
>       if (find "No information") {
>          select enclosing td
>          print
>       } else if (find "Ship date") {
>          select enclosing table
>          select enclosing table
>          print
>       } else {
>          print ">>> Unexpected Results\n"
>          print
>       }
> 
> Does anyone know of programs/languages that let you
> script web sessions like that?  Searching around finds lots
> of mentions of web scraping but no actual programs.
> 
> I have a rough idea of the general structure of the language
> and grammar, and I think that libhtml does most of the
> heavy lifting already.

There are lots of html parsers but the interesting bit here
is that the parse tree seems to be operated on as a whole --
at least that is how I envision operators like find and
select-enclosing working.  This is useful for all sorts of
things: represent some data as a tree, stick probes in it,
walk around the tree, transform it, reuse parts of it in
other trees etc.  Then you can use it for munging any
structured document (email, source code, rcs files, excel,
xml, ...).  You'd need a parser to map a document's structure
into an s-expr and then you can do all the intresting stuff
in this awk-for-s-expr language.

Regular-tree expressions by Shivers & Bagrak may be of
some interest to you.  See
    http://www.cc.gatech.edu/fac/Olin.Shivers/papers/trx.pdf

Reply via email to