Jesse Erlbaum scribbled on 3/19/07 11:07 AM:
Hi Peter --

As one of the Swish-e developers, I can second Michael's endorsement. ;)

I've used switch-e many times over the years.  Do you think it is
lacking in any way compared to Xapian or Plucene?  What about compared
to commercial systems such as Verity or Autonomy?

I can't compare Swish-e to any commercial systems since I haven't used them.

As Tim just posted, I wouldn't even consider Plucene at this point in its life cycle, unless you're talking about a small, fairly static site. Performance is Not Good.

Compared to Xapian: Swish-e doesn't do UTF-8 or good incremental indexing. Swish-e is very fast at both indexing and search; I did see a benchmark once that showed Swish-e was faster than Xapian but that was some time ago.

Xapian svn has a UTF-8 version; it's due to be official with the 1.0 release, whenever that is.

Xapian does have good incremental indexing. It's also a library (unlike Swish-e) so it has some more flexibility.

Swish-e has a built-in HTML/XML parser, which is pretty good (especially if you use libxml2). Xapian has Omega, an additional package that does some HTML parsing iirc.

I find Swish-e "just works" a little more "out of the box" than Xapian, but the two big features (UTF-8 and increm indexing) are a show-stopper if your project requires those.

See also my article here, comparing Lucene, Xapian and Swish-e:
http://dewey.library.nd.edu/mylibrary/manual/ch/ch17.html



I've felt that Switch-e was a bit "long in the tooth" owing to its
legacy.  Do you disagree?  If you were not deeply involved in the
Switch-e project, would you choose it over Xapian or Plucene (or any
other system)?

Swish-e is old, true. The 2.x versions added a lot of features, but those are getting on 6 years old now too. Still, things that Work don't need to be New, do they? ;)

Would I choose it over Plucene? Not a question. Xapian? Well, it would depend on a couple things:

(1) data set. Am I indexing data that is fairly static or mostly dynamic? Example: static HTML or PDF docs, vs providing fulltext search for a database. Swish-e is fast enough and has merging and multi-index search features that let you get around the lack of incremental indexing, so if your data doesn't change much, I'd go with Swish-e. If it does change a lot (e.g., you need to update your index everytime you update your db), then I'd probably go with Xapian.

(2) i18n. Swish-e was first written back in the mid90s so Unicode wasn't even a consideration. There are lots of optimizations in the C code that assume 1 byte = 1 character and so things like UTF-8 Just Don't Work.

You can get around that (as I do) with things like Search::Tools::Transliterate (shameless plug) but if you need Real I18N Support, I'd be going with Xapian.



That said, I would suggest Xapian over Plucene, hands down.

Why do you say?  Any particular gripes?


See Tim's recent post.



And I would also check out KinoSearch, which is (along with Xapian) going to be one of the optional backend IR libraries for the next version of Swish-e.

I've heard of KinoSearch before, but I've not tried it out.  It seemed
that Plucene and Xapian had more established Perl interfaces, but I
could be wrong.


Tim summarized it well. KinoSearch is new-ish, but Marvin is really cranking out some quality stuff. And it's all C and Perl, so if those are your primary languages, the barrier to hacking on it yourself is all the lower.

If I had to do a full UTF-8 compatible, robust, incremental, highly scalable search application tomorrow, I'd be looking seriously at KS and Xapian. Which is why Swish-e will offer those 2 (among others) as backends for version 3. :)

pek

--
Peter Karman  .  http://peknet.com/  .  [EMAIL PROTECTED]

---------------------------------------------------------------------
Web Archive:  http://www.mail-archive.com/[email protected]/
             http://marc.theaimsgroup.com/?l=cgiapp&r=1&w=2
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to