Solvent - Multiple Page Scraping

Serkan Serttop Wed, 28 Nov 2007 17:15:35 -0800

Hi Everyone,
I apologize if this has been answered before, I could
not find an answer in the archive. I have been
following Simile's codes for a while, played around
with Timeplot, Timeline, Exhibit on my localhost
sometime ago.

Recently I am trying to use Solvent and Piggy Bank. I
used Solvent to grab the content from one page with no
problem. But there is not enough explanation on how to
do it for multiple pages. I ran into another thread
that ends with a question from David Morris and the
question seems not to have been answered since
February 2007.

I will repost this question as I have exactly the same
question.
Serkan Serttop

Dave's post from
http://simile.mit.edu/mail/ReadMsg?listId=9&msgId=14587

I'm finally getting to try this screen scraping thing
out, and I'm not sure
how to use the pattern matching function to scrape
multiple pages.

I see this code on one of the sample scrapers:

match:
"^http://www\.vacancyguide\.com/rentals/search\.cfm\?.*$<http://www%5c.vacancyguide%5c.com/rentals/search%5C.cfm%5C?.*$>
"

But I don't see that anywhere inside the actual
scraper code. The site I'm
trying to scrape has a pretty simple url pattern, but
I don't know where to
put it into the code:

http://www.newfarm.com/farmlocator/farm_detail.php?ID=#

where # is a number from 1 to 1200 or so. Can anyone
point me in the right
direction here? Thanks,

Dave

____________________________________________________________________________________
Be a better sports nut! Let your teams follow you
with Yahoo Mobile. Try it now.
http://mobile.yahoo.com/sports;_ylt=At9_qDKvtAbMuh1G1SQtBI7ntAcJ
_______________________________________________
General mailing list
[email protected]
http://simile.mit.edu/mailman/listinfo/general

Solvent - Multiple Page Scraping

Reply via email to