Re: [Gossip] search library battle royale
On Tue, Sep 05, 2006 at 09:39:46PM -0700, Jeff Breidenbach wrote: By private email, one person noted that Xapian is not doing so hot on Czech. As in not returning search results. I'm a little suprised that anything gets found that isn't in ASCII since Omega's indexer is generating words from utf-8 text but treating it as iso-8859-1 (unless JeffB has patched it). Since the second and subsequent characters of any multibyte utf-8 sequence are symbols in iso-8859-1, that means you'll get a word break after any non-ASCII character, even if it's an accented letter. But anyway, utf-8 support for Xapian is pretty much written now. I've just sent JeffB a patch to try (but it'll require a reindex so it'll probably not be live for a while). Another managed to craft a query that causes PyLucene to throw an exception. Any query which it can't parse seems to do that. Try searching for * or , for example. In the meantime, I'd love to hear some pointed comments on the user interface. Do people prefer Omega's search engine standard layout, or PyLucene's blend into service appearance? FWIW, I think I prefer the blend in look. But the look is essentially orthogonal to the choice of engine - it's just a matter of slotting the appropriate HTML into the templates. Cheers, Olly ___ Discussion list for The Mail Archive Gossip@jab.org http://jab.org/cgi-bin/mailman/listinfo/gossip
Re: [Gossip] search library battle royale
Hi Jeff, I don't understand a lot of what's been said about evaluating these search engines, but I'll comment anyway. Since, for me at least, easy to use but full-featured search software is a vital part of any archive, the ideal search engine should be powerful, yet intuitive to operate, or come with good instructions. A tutorial would be helpful. I'd like to be able to search the results of an earlier search. Mac Oglesby It's starting to get bloody. All right! By private email, one person noted that Xapian is not doing so hot on Czech. As in not returning search results. Another managed to craft a query that causes PyLucene to throw an exception. Jeff and I are patching up the staggering contenders as best we can, and then sending them right back out for more punishment. In the meantime, I'd love to hear some pointed comments on the user interface. Do people prefer Omega's search engine standard layout, or PyLucene's blend into service appearance? -Jeff ___ Discussion list for The Mail Archive Gossip@jab.org http://jab.org/cgi-bin/mailman/listinfo/gossip ___ Discussion list for The Mail Archive Gossip@jab.org http://jab.org/cgi-bin/mailman/listinfo/gossip
Re: [Gossip] search library battle royale
Jeff Breidenbach wrote: Ok, ok, you score some serious points here, especially on Asian languages, but Xapian holds its own on European languages. I think. Check out some Brazilian Portuguese action below. And I hear the Xapian team is working hard on full UTF-8 support. Hope they aren't planning any holidays soon ;-) http://www.mail-archive.com/cgi-bin/omega/omega?P=%C3%B8l+eftersmagDB=brygforum%40lists.haandbryg.dkFMT=queryxP=beer.xDB=brygforum%40lists.haandbryg.dkxFILTERS=--O Check out the Danish characters - they are displayed as: Firefox: reverse-field questionmarks Explorer: Japanese/Chinese! The mails look fine in the mail-archive view, its just the search engine results that look ugly. /Dan ___ Discussion list for The Mail Archive Gossip@jab.org http://jab.org/cgi-bin/mailman/listinfo/gossip