[ http://issues.apache.org/jira/browse/LUCENE-474?page=all ]
Mark Harwood updated LUCENE-474:
--------------------------------
Attachment: colloc.zip
Here's some code that I've used before to find phrases in an index - see
CollocationFinder.java.
If your index has termvector support enabled you can run it to mine the
collocated terms. This is typically a long operation that you dont want to do
too often.
The CollocationIndexer can be used to store the mined collocations in an index.
Possible uses for collocations are:
* automatically identifying candidate terms in a query that can be turned into
a phrase query
* better spelling correction by using all terms in query as context to pick the
most likely spelling variation
Haven't done too much with this code but I've added it here because it sounds
like it could be relevant
Cheers
Mark
> High Frequency Terms/Phrases at the Index level
> -----------------------------------------------
>
> Key: LUCENE-474
> URL: http://issues.apache.org/jira/browse/LUCENE-474
> Project: Lucene - Java
> Type: New Feature
> Versions: 1.4
> Reporter: Suri Babu B
> Attachments: colloc.zip
>
> We should be able to find the all the high frequncy terms/phrases ( where
> frequency is the search criteria / benchmark)
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]