Jon Scott Stevens wrote: > Adding support to Lucene for Nilsimsa seems like a cool idea... > > http://ixazon.dynip.com/~cmeclax/nilsimsa.html > > The index would be the hash and one could use Lucene to rank searches based > on the Nilsimsa rating of the results...
Nilsimsa employs a very different model than Lucene. So this would require a re-write of the indexing and search portions of Lucene, which is most of the code. Nilsimsa appears to use what is called a "signature file" approach in the literature, while Lucene uses an "inverted file". A search on Google for "signature file versus inverted index" turns up a paper by Zobel et. al. which concludes: Our conclusions are unequivocal. For typical document indexing applications, current signature file techniques do not perform well compared to current implementations of inverted file indexes. See: http://www.cs.columbia.edu/~pirot/cs6111/Readings/zobel98.pdf Doug -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
