Erik Hatcher wrote:


On Jan 10, 2005, at 6:54 PM, David Spencer wrote:

Hi...I wrote the WordNet sandbox code - but I'm not sure if I undertand this thread. Are we saying that it does not work w/ the new WordNet data, or that code in Eric's book is better/more up to date etc?


I have not tried the sandbox with any versions past WordNet 1.6. Karthik shows a Java API to it, which I have not used - only your code that parses the prolog files. So the book code explains exactly what is in the sandbox and describes WordNet 1.6 integration. Though WordNet has evolved.

If needed I can update the sandbox code..


It'd be awesome to have current WordNet support - I haven't looked at what is involved in making it so.


I verified that the code works w/ the latest WordNet (2.0), and it does so, no problem. The relevant data from WordNet has not changed so there's no need to upgrade WordNet for this package at least.

I added "query expansion" which takes in a simple query string and for every term adds their synonyms. There's an optional boost parameter to be used to "penalize" synonyms if you want to use the heuristic that the user probably knows the right word.

One example of expansion with the synonym boost set to 0.9 is the query "big dog" expands to:

big adult^0.9 bad^0.9 bighearted^0.9 boastful^0.9 boastfully^0.9 bounteous^0.9 bountiful^0.9 braggy^0.9 crowing^0.9 freehanded^0.9 giving^0.9 grown^0.9 grownup^0.9 handsome^0.9 large^0.9 liberal^0.9 magnanimous^0.9 momentous^0.9 openhanded^0.9 prominent^0.9 swelled^0.9 vainglorious^0.9 vauntingly^0.9
dog andiron^0.9 blackguard^0.9 bounder^0.9 cad^0.9 chase^0.9 click^0.9 detent^0.9 dogtooth^0.9 firedog^0.9 frank^0.9 frankfurter^0.9 frump^0.9 heel^0.9 hotdog^0.9 hound^0.9 pawl^0.9 tag^0.9 tail^0.9 track^0.9 trail^0.9 weenie^0.9 wiener^0.9 wienerwurst^0.9


Amusingly then, documents with the terms "liberal wienerwurst" match "big dog"! :)

Javadoc is here:

http://www.searchmorph.com/pub/jakarta-lucene-sandbox/contributions/WordNet/build/docs/api/org/apache/lucene/wordnet/package-summary.html

The new query expansion is here:

http://www.searchmorph.com/pub/jakarta-lucene-sandbox/contributions/WordNet/build/docs/api/org/apache/lucene/wordnet/SynExpand.html


Want to try it out? This page *expands* a query and prints out the result (but doesn't execute it yet).
http://www.searchmorph.com/kat/synonym.jsp?syn=big


CVS tree here:

http://cvs.apache.org/viewcvs.cgi/jakarta-lucene-sandbox/contributions/WordNet/

If you just want to use a prebuild index it's here (1MB):
http://searchmorph.com/pub/syn_index.zip

The prebuilt jar file is here:

http://www.searchmorph.com/pub/lucene-wordnet-dev.jar


Redundant weblog entry here:

http://www.searchmorph.com/weblog/index.php?id=34

Hope y'all like it and someone finds it useful,
  Dave

PS
Oh - it may need the 1.5 dev branch of Lucene to work - I'm not positive but it I tried to remove deprecated warnings and doing so may have tied it to the latest code...



Erik


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to