Russ Cox wrote:
I would like to be able to write scripts like this:

        load "http://apc-reset/outlets.htm";
        find "yoshimi"
        nearest option, set "Immediate Reboot"
        submit

or like this:

        load "http://www.fedex.com/Tracking";
        find form
        enter "792544024753"
        submit
        
        if (find "No information") {
           select enclosing td
           print
        } else if (find "Ship date") {
           select enclosing table
           select enclosing table
           print
        } else {
           print ">>> Unexpected Results\n"
           print
        }

Does anyone know of programs/languages that let you
script web sessions like that?  Searching around finds lots
of mentions of web scraping but no actual programs.


Well, Haskell has several HTML/XML parser packages[0] that parse a sequence of tags and return some tree-like structures, and then various queries may be made over that tree, like extracting tags with given properties, or building new document trees/extracting/transforming subtrees. There are some facilities to connect to web servers and retrieve HTTP responses, and I believe to submit forms, too (although I never tried the latter practically, I only worked with the GET method).

Then, with Haskell you may create sort of your own domain specific language (DSL) to perform your tasks. In fact, your example seems like you indeed want some DSL to analyze web pages.

A very simple example of this may be found in my "cabalfind"[1] program which parses a search engine (eg Google) response in order to find links with required properties (pointing to files with ".cabal" [2] suffix). Although cabalfind does not introduce any DSLs.

However I see two issues here: someone (you?) has to learn Haskell, and, if thinking of Plan9, there is no Haskell implementation except an old Hugs which would be too slow for this task (I still have some plans to port GHC or NHC, but cannot yet find my own resources even to start porting).

-------
[0] but there must be a lot of that for Java; why don't you want to use that?

[1] http://www golubovsky.org/repos/cabalfind,
    http://www.haskell.org/hawiki/CabalFind

[2] cabal is Haskell software package management system.

Reply via email to