Re: Crowbar and multiple page scraping

Ryan Lee Mon, 01 Oct 2007 18:13:22 -0700

Stefano Mazzocchi wrote:
> Kimble Young wrote:
>> Hi,
>>
>> I've looking at Solvent and Crowbar for doing some of my own mashups but
>> using my own APIs, database etc. Crowbar is very
>> promising and I had some luck with it initially but it seems that I've
>> hit a wall with multiple page scraping.
>>
>> Multi-page scraping would be invaluable in making Crowbar and Solvent a
>> very powerful solution for people who want to make their own mashups
>> outside of the environment provided by Piggybank.
>>
>> Do we know what's involved in making multi-page scraping happen? Is it a
>> complex solution?
> 
> Crowbar can do anything that Piggy Bank + Solvent can do and multi-page
> scraping has been in Piggy Bank for quite some time.


Actually, it can't run a multi-page scraper.  We have actively tried to 
make it work; the multi-page part should silently open up a new, hidden 
frame in the Crowbar URL display and proceed with scraping there, but 
for whatever reason it exits instead.

We had a contributor who was looking into improving Crowbar.  His 
patches made it into Crowbar, improving its reliability; last I heard, 
he was going to look into the multi-page problem, but it's been some 
time since I heard from him.

Before his patches, I had a vague idea of what was going on.  With his 
patches, which I haven't had the opportunity to review in depth, I have 
even less of an idea - maybe the solution is complex, maybe there's a 
typo that needs fixing.  I couldn't really say at this point.

-- 
Ryan Lee                  [EMAIL PROTECTED]
MIT CSAIL Research Staff  http://simile.mit.edu/
http://people.csail.mit.edu/ryanlee/
_______________________________________________
General mailing list
[email protected]
http://simile.mit.edu/mailman/listinfo/general

Re: Crowbar and multiple page scraping

Reply via email to