David Nicol wrote on 6/8/11 10:59 AM: > Just for conversation's sake, anyone familiar enough with Lucy and with SQLite > FTSE to compare and contrast?
Good topic for conversation, David. I've read over the SQLite full-text search docs[0] and off-the-cuff I'd say that there are pros/cons to both approaches. The architecture underlying both is basically the same: an inverted index of tokenized terms. Obviously if you want to provide search on top of an existing SQLite database, using the built-in FTS features are very convenient. If your text is mostly ASCII and you don't require custom tokenizing (or stemming beyond the supplied Porter stemmer), then SQLite is probably going to serve you well for small-to-medium projects. If you need to scale your search application beyond a few gigs of data, or your doc collection isn't already in a SQLite db, or you need i18n support (esp for stemming in multiple languages), then you're probably going to need an IR library like Lucy. First, it's a library, so you can customize your indexing and searching code to fit your particular application. Second, it's in Perl (which for this audience should be a win). Third, it provides very flexible tokenizing and stemming options (Lucy ships with Snowball support). Lucy is in the same camp as Lucene, Sphinx, Xapian, etc. It's for when you Get Serious about your search application. [0] http://www.sqlite.org/fts3.html -- Peter Karman . http://peknet.com/ . [email protected] _______________________________________________ kc mailing list [email protected] http://mail.pm.org/mailman/listinfo/kc
