try Acme.Spider at Acme.com regards Mark Wardell -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Corey Wineman Sent: Tuesday, May 09, 2000 3:14 PM To: [EMAIL PROTECTED] Subject: spider
Hello, I have just joined this mailing list. I haven't seen any messages and don't know if anyone is listening. Anyway, I have been working on a webspider for my company for some time now. I inherited much of the code from a previous employee. It is written completely in Java, and I have spent a long time trying to make it run properly. It is still plagued with memory leaks and other networking problems. The biggest problem has been dealing with threading, recognizing blackholes and keeping track of a huge number of nodes. What I want to do is traverse through a site and do processing on certain files, storing the results( things like, if the file meets a certain criteria, what is the IP of the site, when did I visit the site) to a database. I would like to be able to configure the spider. Limiting the depth from a source URL, limiting the depth it will search onto external sites, and setting the defaults on various timeouts. Does anyone know of a webspider that does some of these things and is available along with the source code? Thanks, Corey