Russ Cox wrote:
I would like to be able to write scripts like this:
load "http://apc-reset/outlets.htm"
find "yoshimi"
nearest option, set "Immediate Reboot"
submit
or like this:
load "http://www.fedex.com/Tracking"
find form
enter "792544024753"
submit
if (find "No information") {
select enclosing td
print
} else if (find "Ship date") {
select enclosing table
select enclosing table
print
} else {
print ">>> Unexpected Results\n"
print
}
Does anyone know of programs/languages that let you
script web sessions like that? Searching around finds lots
of mentions of web scraping but no actual programs.
Well, Haskell has several HTML/XML parser packages[0] that parse a
sequence of tags and return some tree-like structures, and then various
queries may be made over that tree, like extracting tags with given
properties, or building new document trees/extracting/transforming
subtrees. There are some facilities to connect to web servers and
retrieve HTTP responses, and I believe to submit forms, too (although I
never tried the latter practically, I only worked with the GET method).
Then, with Haskell you may create sort of your own domain specific
language (DSL) to perform your tasks. In fact, your example seems like
you indeed want some DSL to analyze web pages.
A very simple example of this may be found in my "cabalfind"[1] program
which parses a search engine (eg Google) response in order to find links
with required properties (pointing to files with ".cabal" [2] suffix).
Although cabalfind does not introduce any DSLs.
However I see two issues here: someone (you?) has to learn Haskell, and,
if thinking of Plan9, there is no Haskell implementation except an old
Hugs which would be too slow for this task (I still have some plans to
port GHC or NHC, but cannot yet find my own resources even to start
porting).
-------
[0] but there must be a lot of that for Java; why don't you want to use
that?
[1] http://www golubovsky.org/repos/cabalfind,
http://www.haskell.org/hawiki/CabalFind
[2] cabal is Haskell software package management system.