> >>like to use a spider/crawler. We are running RedHat so Verity's spider
> >>is out of the question. Does anyone know/have/suggest a spider/crawler
> >>routine, custom tag or application that can be used to recursively
> >>fetch the content of our site for indexing? We would strongly prefer
> >>something free but might consider others. I can come up with a routine
> >>using cfhttp but I am afraid of perfomance issues.
If you are looking for just a spider (not indexing) and you are cool
with java, I have used the Acme libraries
http://www.acme.com/java/software/ with much success - Check out web
cat, web grep, and web copy. Their programs are written a bit oddly, but
are helpful (source included for your coding pleasure).
If you need everything, I have used Lucene (which I think has been
mentioned on this thread) with much success. Here is an article on how
to use it in java - which is translatable to cf if you know a bit of
java.
http://www.javaworld.com/javaworld/jw-09-2000/jw-0915-lucene.html
Cheers,
--
Rob <[EMAIL PROTECTED]>
[Todays Threads] [This Message] [Subscription] [Fast Unsubscribe] [User Settings]

