Re: Help! A spider...

Rob Mon, 19 Apr 2004 07:30:15 -0700

> >>Our current indexing routine leaves much to be desired and we would
> >>like to use a spider/crawler. We are running RedHat so Verity's spider
> >>is out of the question. Does anyone know/have/suggest a spider/crawler
> >>routine, custom tag or application that can be used to recursively
> >>fetch the content of our site for indexing? We would strongly prefer
> >>something free but might consider others. I can come up with a routine
> >>using cfhttp but I am afraid of perfomance issues.

If you are looking for just a spider (not indexing) and you are cool
with java, I have used the Acme libraries
http://www.acme.com/java/software/ with much success - Check out web
cat, web grep, and web copy. Their programs are written a bit oddly, but
are helpful (source included for your coding pleasure).

If you need everything, I have used Lucene (which I think has been
mentioned on this thread) with much success. Here is an article on how
to use it in java - which is translatable to cf if you know a bit of
java.
http://www.javaworld.com/javaworld/jw-09-2000/jw-0915-lucene.html

Cheers,
--
Rob <[EMAIL PROTECTED]>

[Todays Threads] [This Message] [Subscription] [Fast Unsubscribe] [User Settings]

Re: Help! A spider...

Reply via email to