Does this question make sense? What I want is to compute term statistics over a corpus, and then use these statistics when doing scoring + retrieval using a MemoryIndex or RAMDirectory.
How can I do that? Thanks, Joseph On Thu, Oct 28, 2010 at 8:06 PM, Joseph Turian <tur...@gmail.com> wrote: > How do I use MemoryIndex or RAMDirectory, but score using term statistics > from a corpus given during preprocessing? > > Let's say I want to use a MemoryIndex or RAMDirectory to store a *single* > document, and then run a query against it, and get the score of the query > using just this one document. > I know how to do this. See, for some example code, this blog post on > persistent search: > http://www.sajalkayan.com/prospective-search-using-python.html > > Now what I want to do is take a *corpus* of K documents, and "index" it > during preprocessing to calculate the term statistics (e.g. idf). > I then want to freeze these term statistics, and use them whenever I insert > and compute the query score of a new document. > I.e. I want to *quickly* query a new document, using preprocessed term > statistics in the scoring function. > How can I do this? > > Thanks, > Joseph >