Re: [Gossip] Re: [Team] search library battle royale [draft]

2006-09-05 Thread Michal Suchanek

Hello

On 9/4/06, Jeff Marshall [EMAIL PROTECTED] wrote:

Lucene: it slices, it dices, it's twice as nice as...well, htDig 3.1 for
sure.

Here are a couple reasons why Lucene is the Battle Royale champ in my eyes:

* Powerful searching features:
http://lucene.apache.org/java/docs/queryparsersyntax.html - let's see
the other kids do all this.

* When was the last time you searched Unicode characters on The Mail
Archive?  That's right, never!  But with our experiemental Lucene
interface, you can:



That's true. I tried to search for a Japanese list in the archive, and
found announce-jp%40jp.freebsd.org. I copied 議論 from a BSDCon announce
there, and only managed to find it with Lucene.
For the other two I either failed to configure them properly to look
into that list or they are incapable of finding such word.

So if I cared about searching in non-English lists Lucene would be the
winner for me.

Thanks

Michal
___
Discussion list for The Mail Archive
Gossip@jab.org
http://jab.org/cgi-bin/mailman/listinfo/gossip


Re: [Gossip] Re: [Team] search library battle royale [draft]

2006-09-04 Thread Jeff Marshall
Lucene: it slices, it dices, it's twice as nice as...well, htDig 3.1 for 
sure.


Here are a couple reasons why Lucene is the Battle Royale champ in my eyes:

* Powerful searching features: 
http://lucene.apache.org/java/docs/queryparsersyntax.html - let's see 
the other kids do all this.


* When was the last time you searched Unicode characters on The Mail 
Archive?  That's right, never!  But with our experiemental Lucene 
interface, you can: 
http://www.mail-archive.com/lucene/search.py?list=freebsd-users-jp%40jp.freebsd.orgquery=%E3%81%95%E3%82%93



Downside/upside: Xapian has the nice Omega web interface.  Lucene has 
Nutch, but for a few reasons it didn't fit our needs.  So, our web 
interface is coded up by hand using PyLucene/modpython.  The downside is 
that it meant a bit more work for us, but the upside is complete 
configurability.  Missing right now is paging - it simply displays the 
best 100 matches.


Downside: our PyLucene interface doesn't have a killer name like Omega.

Jeff Marshall
[EMAIL PROTECTED]



Jeff Breidenbach wrote:

If you don't care about search,  don't read further.

===

Sunday, Sunday, SUNDAY!

Come see the data crunching, webpage hopping, free-styling
search library action. Two monster libraries, titans of Free
Software technology, compete to become the native search
engine for The Mail Archive.

Watch as Xapian Omega crushes and destroys the competition,
finishing off queries in milliseconds. This probabilistic juggernaut is
a battle tested, email chewing reigning champion in Europe. Honed
for years and more hardened than quartz, Jeff Breidenbach will
drive Xapian Omega during this Battle Royale.

PyLucene is a mild mannered garbage collectin' programming library
just like your mom's search index. That is, if your mom's search index
could jump partitions, crush gigabytes down to tiny sements, and
plow through millions of records. Forged on the anvil of a Xerox PARC
alumni, brimming with black magic, Lucene will be wrought by the
indomitable Jeff Marshall.

We're taking these two byte belching, buffer oversized, monster libraries
and pitting them head to head. Old geezer HtDig 3.1 will also make a
final appearance in the arena.  All three engines can run on any list,
just by replacing gossip@jab.org with the listname of your choice.

Who will win the monster rally? Xapian vs Lucene? Jeff vs Jeff? Yes,
you decide! Send comments to gossip, or privately if you are shy,
for the next week or so. Bonus points for using phrases like
like slamming! spectacular or crushed like a bug. Who's got
the slickest user interface? Which contender has superior
data-crunching  performance? How about grits, determination and
the baddest sounding name?

Want to see something tweaked? Have questions? Ask and it will
be done if humanly possible - this is a gritty bit-for-bit battle of
hotrod software and programmer ingenuity no holds barred.  Ladies
and gentlemen...  Start your search engines!

HtDig3.1
http://www.mail-archive.com/cgi-bin/htsearch?config=gossip_jab_orgwords=magically 



Xapian Omega
http://www.mail-archive.com/cgi-bin/omega/omega?P=magically[EMAIL PROTECTED] 



PyLucene
http://www.mail-archive.com/lucene/[EMAIL PROTECTED]query=magically 



___
Discussion list for The Mail Archive
Gossip@jab.org
http://jab.org/cgi-bin/mailman/listinfo/gossip




___
Discussion list for The Mail Archive
Gossip@jab.org
http://jab.org/cgi-bin/mailman/listinfo/gossip