try Acme.Spider at Acme.com
regards
Mark Wardell

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Corey
Wineman
Sent: Tuesday, May 09, 2000 3:14 PM
To: [EMAIL PROTECTED]
Subject: spider


Hello,

I have just joined this mailing list. I haven't seen any messages and don't
know if anyone is listening.
Anyway, I have been working on a webspider for my company for some time now.
I inherited much of the code from a previous  employee. It is written
completely in Java, and I have spent a long time trying to make it run
properly. It is still plagued with memory leaks and other networking
problems. The biggest problem has been dealing with threading, recognizing
blackholes and keeping track of a huge number of nodes.
What I want to do is traverse through a site and do processing on certain
files, storing the results( things like, if the file meets a certain
criteria, what is the IP of the site, when did I visit the site) to a
database. I would like to be able to configure the spider. Limiting the
depth from a source URL, limiting the depth it will search onto external
sites, and setting the defaults on various timeouts.

Does anyone know of a webspider that does some of these things and is
available along with the source code?

Thanks,
Corey

Reply via email to