We evaluated Verity for several weeks and had a consultant on site helping us for a few days. We were favorably impressed with the product but ended up choosing Lucene after a week or so of comparing the two head-to-head. Had our requirements been different, I can see Verity as being a superior choice in many instances. For starters, it does a whole lot more 'out of the box'. However, we have had great success with Lucene thus far - no regrets.
We are indexing a large corpus of XML documents (~10M). One thing that Verity does with XML notes is that it indexes each XML tag as a zone.* What's cool about it is that the zones are nested so that it mirrors the schema of your XML document. You can limit your search to any part of the document by searching on specific zones. A Verity zone is analogous to a Lucene field. Verity also has 'field' indexes - but these are a different kind of index that Lucene does not have. Verity fields allow you to index various numeric types, date types etc. side-by-side with your textual index. The edge that Verity zones have over Lucene fields is that they are nested. However, nested fields can be simulated quite easily in Lucene by doing redundant indexing. I have a hunch this is what Verity does anyways because their indexes are HUGE. Verity Zones may mean different things for different kinds of indexed documents. Incidentally, we found that the indexing speed of Lucene was much faster. The K2Spider could spend days optimizing an index. Verity seemed to be faster for retrievals but they compared well. We ran a lot of tests, but in the end our results were sort of 'touchy feely'. We decided that Lucene was plenty fast for us. Regards, Philip *not each instance of a tag, but rather a zone for each kind of tag. -----Original Message----- From: Joe Lerner [mailto:[EMAIL PROTECTED]] Sent: Friday, January 25, 2002 1:51 PM To: [EMAIL PROTECTED] Subject: Zones Hi, We use Verity, a commercial vendor, for our Search, but were in serious trouble with its performance, and looking for a solid, more economical, open source alternative, like Lucene. A prototype we built using Lucene compared favorably with Verity, but then along came "zones". Verity tech support helped us re-configure our indices with "zones", giving us a fivefold increase in performance. Note, "Zones" are a separate, non-fielded, word list with addressing maps (each word mapped to an address/document). Is anyone familiar with Verity "zones"? Does Lucene implement "zones" in its own way? How? -Joe -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]> -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
