If those are the known fields in the documents, you may extract words while indexing and create facets. Lucene supports faceted search which can give you Top n counts of such fields, which is much more efficient.
Another option is apply clustering algorithm on results which can provide Top n words, you can refer http://search.carrot2.org On Fri, Feb 13, 2015 at 10:13 PM, Maisnam Ns <maisnam...@gmail.com> wrote: > Hi, > > Can someone help me with this use case: > > 1. I have to search a string and let's say the search engine(it is not > lucene) found this string in 100,000 documents. I need to find the top 10 > words occurring in this 100000 documents.As the document size is large how > to further index these documents and find the top 10 words > > 1. I am thinking of using Lucene Ramdirectory or memory indexing and find > the most occurring top 10 words. > 2. Is this the right approach , indexing and writing to the disk would be > almost over kill and a user can search any number of times. > > Thanks in advance. >