Re: Next Word - Any Suggestions?

mark harwood Tue, 26 Oct 2010 05:43:57 -0700

See the Collocation stuff here https://issues.apache.org/jira/browse/LUCENE-474

----- Original Message ----
From: Lucene <[email protected]>
To: [email protected]
Sent: Tue, 26 October, 2010 13:27:06
Subject: Next Word - Any Suggestions?

Am about to implement a custom query that is sort of mash-up of Facets,
Highlighting, and SpanQuery - but thought I'd see if anyone has done
anything similar. 

In simple words, I need facet on the next word given a target word.

For example, if my index only had the following 5 documents (comprised of a
sentence each):

Doc 1 - The quick brown fox jumped over the fence.

Doc 2 - The sly fox skipped over the fence.

Doc 3 - The fat fox skipped his afternoon class.

Doc 4 - A brown duck and red fox, crashed the party.

Doc 5 - Charles Brown! Fox! Crashed my damn car.

The query should give the frequency of the distinct terms after the word
"fox":

skipped - 2

crashed - 2 

jumped - 1

Long-term, do the opposite - frequency of the distinct terms before the word
"fox":

brown - 2

sly - 1

fat - 1 

red - 1

My guess is that either the FastVectorHighlighter or SpanQuery would be a
reasonable starting point. I was hoping to take advantage of Vectors as I am
storing termVectors, termPositions, and termOffsets for the field in
question.

Grateful for any thoughts . . . reference implementations . . . words of
encouragement . . . free beer - whatever you can offer.

Gracias,

Christopher

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Next Word - Any Suggestions?

Reply via email to