Re: [Gossip] search library battle royale

2006-09-08 Thread Olly Betts
On Tue, Sep 05, 2006 at 09:39:46PM -0700, Jeff Breidenbach wrote:
 By private email, one person noted that Xapian is not doing
 so hot on Czech. As in not returning search results.

I'm a little suprised that anything gets found that isn't in ASCII
since Omega's indexer is generating words from utf-8 text but treating
it as iso-8859-1 (unless JeffB has patched it).  Since the second and
subsequent characters of any multibyte utf-8 sequence are symbols in
iso-8859-1, that means you'll get a word break after any non-ASCII
character, even if it's an accented letter.

But anyway, utf-8 support for Xapian is pretty much written now.  I've
just sent JeffB a patch to try (but it'll require a reindex so it'll
probably not be live for a while).

 Another managed to craft a query that causes PyLucene to throw an
 exception.

Any query which it can't parse seems to do that.  Try searching
for * or , for example.

 In the meantime, I'd love to hear some pointed comments
 on the user interface. Do people prefer Omega's search engine
 standard layout, or PyLucene's blend into service appearance?

FWIW, I think I prefer the blend in look.  But the look is essentially
orthogonal to the choice of engine - it's just a matter of slotting the
appropriate HTML into the templates.

Cheers,
Olly

___
Discussion list for The Mail Archive
Gossip@jab.org
http://jab.org/cgi-bin/mailman/listinfo/gossip


Re: [Gossip] search library battle royale

2006-09-06 Thread Mac Oglesby


Hi Jeff,

I don't understand a lot of what's been said about evaluating these 
search engines, but I'll comment anyway.


Since, for me at least, easy to use but full-featured search software 
is a vital part of any archive, the ideal search engine should be 
powerful, yet intuitive to operate, or come with good instructions. A 
tutorial would be helpful.


I'd like to be able to search the results of an earlier search.

Mac Oglesby







It's starting to get bloody. All right!

By private email, one person noted that Xapian is not doing
so hot on Czech. As in not returning search results. Another
managed to craft a query that causes PyLucene to throw an
exception. Jeff and I are patching up the staggering contenders
as best we can, and then sending them right back out for more
punishment.

In the meantime, I'd love to hear some pointed comments
on the user interface. Do people prefer Omega's search engine
standard layout, or PyLucene's blend into service appearance?

-Jeff

___
Discussion list for The Mail Archive
Gossip@jab.org
http://jab.org/cgi-bin/mailman/listinfo/gossip



___
Discussion list for The Mail Archive
Gossip@jab.org
http://jab.org/cgi-bin/mailman/listinfo/gossip


Re: [Gossip] search library battle royale

2006-09-05 Thread Dan Temple

Jeff Breidenbach wrote:

Ok, ok, you score some serious points here, especially on Asian
languages, but Xapian holds its own on European languages. I
think. Check out some Brazilian Portuguese action below.  And I
hear the Xapian team is working hard on full UTF-8 support.


Hope they aren't planning any holidays soon ;-)

http://www.mail-archive.com/cgi-bin/omega/omega?P=%C3%B8l+eftersmagDB=brygforum%40lists.haandbryg.dkFMT=queryxP=beer.xDB=brygforum%40lists.haandbryg.dkxFILTERS=--O

Check out the Danish characters - they are displayed as:

Firefox: reverse-field questionmarks
Explorer: Japanese/Chinese!

The mails look fine in the mail-archive view, its just the search engine 
results that look ugly.


/Dan



___
Discussion list for The Mail Archive
Gossip@jab.org
http://jab.org/cgi-bin/mailman/listinfo/gossip