[ http://issues.apache.org/jira/browse/LUCENE-474?page=all ]

Mark Harwood updated LUCENE-474:
--------------------------------

    Attachment: colloc.zip

Here's some code that I've used before to find phrases in an index - see 
CollocationFinder.java.
If your index has termvector support enabled you can run it to mine the 
collocated terms. This is typically a long operation that you dont want to do 
too often.
The CollocationIndexer can be used to store the mined collocations in an index.

Possible uses for collocations are:
* automatically identifying candidate terms in a query that can be turned into 
a phrase query
* better spelling correction by using all terms in query as context to pick the 
most likely spelling variation 

Haven't done too much with this code but I've added it here because it sounds 
like it could be relevant

Cheers
Mark



> High Frequency Terms/Phrases at the Index level
> -----------------------------------------------
>
>          Key: LUCENE-474
>          URL: http://issues.apache.org/jira/browse/LUCENE-474
>      Project: Lucene - Java
>         Type: New Feature
>     Versions: 1.4
>     Reporter: Suri Babu B
>  Attachments: colloc.zip
>
> We should be able to find the all the high frequncy terms/phrases ( where 
> frequency  is the search criteria / benchmark)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to