On Mon, Apr 12, 2010 at 9:50 AM, Herbert L Roitblat <h...@orcatec.com> wrote: > Thank you Michael. Your suggestions are helpful. I inherited all of the > code that uses pyLucene and don't consider myself an expert on it, so I very > much appreciate your suggestions. > > It does not seem to be the case that these elements represent the index of > the collection. TermInfo and Term grow as I retrieve more documents. There > was no trouble building the index.
Sorry I meant the "terms dict index". Lucene internally loads certain data structures (like the terms dict index, seen as many TermInfo/Term/String objects) from the index into RAM, so a heap dump of even a single open IndexReader is expected to show a great many TermInfo/Term/String objects (though in 3.1 this is greatly improved). > The contents of these fields are the tokens (some fields are tokenized, > others not) of the document fields. In the tokenized fields, there is one > object for each word. They seem to be in order of the documents for which > the term vectors are being sought. So these objects seem to represent a > "concatenation" of all of the documents being considered in order, and if > they are never removed, would always overwhelm the heap with a large > document set. They are not the index in the usual sense, I think. Before I > start retrieving documents, there is barely anything in these objects. > > What is holding the document contents in the heap after the fields > information is returned? > > Can you say more about incRef/decRef? I deleted all variables that > interacted with Lucene and it seems to have made no difference > > There are not a lot of different fields, I would say on the order of 50 with > about 20 of them in virtually every document. Can you run CheckIndex (java -ea -cp /path/to/lucene-core-XXX.jar org.apache.lucene.index.CheckIndex /path/to/your/index) and post the output? > It uses: > lucene.IndexReader.open(self._store) This is perfectly correct. If you're not using incRef/decRef on this reader then, then calling close on it will close it. Mike --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org