RE: Lucene and SIPs

Larry Ogrodnek Thu, 22 Jun 2006 07:49:32 -0700

I didn't make too much progress, and kind of ended up dropping it.


One thing that I played with was creating multiple phrase indexes, one
each for 2, 3, 4, and 5 words.  I wrote a tokenizer that would batch up
the words, so, for the input string:

 

The quick brown fox jumps over the slow lazy dog.

 

The tokenizer for 3 words would return:

 

The quick brown

Quick brown fox

Brown fox jumps

Fox jumps over

...

 

This seemed like a reasonably start... the problem is resolving the
overlap for display, and figuring out which words are the most
important, e.g. if the above sentence itself was pretty rare, and you're
looking at the phrase-index-3, each one of its sub-phrases would end up
being significant.... Which one do you show?  Or do you combine them
into a longer phrase?  If so, where do you stop?

 

It seemed like an easy first-approach to try out, but I'm not sure it's
even in the right direction...

 

 

 

 

________________________________

From: Nader Akhnoukh [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, June 21, 2006 8:14 PM
To: Larry Ogrodnek
Subject: Lucene and SIPs

 

Hi Lawrence, I saw a posting to the Lucene group you made in February
concerning using Lucene to find SIPs.

Did you make any progress with this?  I'm able to find significant
single terms, but am stumped by phrases. 

Thanks,
Nader

RE: Lucene and SIPs

Reply via email to