Re: Top 10 words

Jigar Shah Fri, 13 Feb 2015 09:43:09 -0800

If those are the known fields in the documents, you may extract words while
indexing and create facets. Lucene supports faceted search which can give
you Top n counts of such fields, which is much more efficient.


Another option is apply clustering algorithm on results which can provide
Top n words, you can refer http://search.carrot2.org




On Fri, Feb 13, 2015 at 10:13 PM, Maisnam Ns <[email protected]> wrote:

> Hi,
>
> Can someone help me with this use case:
>
> 1. I have to search a string and let's say the search engine(it is not
> lucene) found this string in 100,000 documents.  I need to find the top 10
> words occurring in this 100000 documents.As the document size is large how
> to further index these documents and find the top 10 words
>
> 1. I am thinking of using Lucene Ramdirectory or memory indexing and find
> the most occurring top 10 words.
> 2. Is this the right approach , indexing and writing to the disk would be
> almost over kill and a user can search any number of times.
>
> Thanks in advance.
>

Re: Top 10 words

Reply via email to