Ah yes, that is the way to go. It is a bit harder here, because we also use a per-user InMemoryIndex that is combined in a multi-reader, so it will be a bit more work, but I think it will be doable. Thanks for all the help.
That said, I found it not-so-easy to debug this issue; are there methods (on the IndexWriter / text in the infoStream?) that I could have used to detect what was going on? That might be helpful for other as well? -Rob On Tue, Nov 10, 2015 at 1:32 PM, Jürgen Albert <j.alb...@data-in-motion.biz> wrote: > Hi Rob, > > we use a SearcherManager to obtain a fresh Searcher for every Query. From > the Searcher we get the Reader. After the query you call > searcherManager.release(searcher). The SearcherManager takes care of the > rest. > > Regards, > > Jürgen. > > > Am 10.11.2015 um 13:27 schrieb Rob Audenaerde: > >> Hi Jürgen, Michael >> >> Thanks! I seem to be able to reduce the index size by closing and >> restarting our application. This reduces the index size from 22G tot 4G, >> with is somewhat the expected size. The infoStream also gives me the >> 'removed unreferenced file (IFD 0 [2015-11-10T12:21:49.293Z; main]: init: >> removing unreferenced file '...) >> >> Now I just need to figure out how to close the IndexReader while keeping >> the application running.. I guess I should/could do something with the >> openIfChanged. Will look further. >> >> -Rob >> >> >> >> On Tue, Nov 10, 2015 at 12:19 PM, Jürgen Albert < >> j.alb...@data-in-motion.biz >> >>> wrote: >>> Hi Rob, >>> >>> we had a similar problem. In our case we had open index readers, that >>> blocked the index from merging its segments and thus deleting the marked >>> segments. >>> >>> Regards, >>> >>> Jürgen. >>> >>> >>> Am 06.11.2015 um 08:59 schrieb Rob Audenaerde: >>> >>> Hi will, others >>>> >>>> Thanks for you reply, >>>> >>>> As far as I understand it, deleting a document is just setting the >>>> deleted >>>> bit, and when segments are merged, then the documents are removed. (not >>>> really sure what this means exactly; I guess the document gets removed >>>> from >>>> the store, the terms will no longer refer to that document. Not sure if >>>> terms get removed if no longer needed, etc). If there are resources to >>>> read >>>> to improve my understanding I havo not found them (yet), if you could >>>> point >>>> me to some that be great! >>>> >>>> I use the default IndexWriterConfig, which I see uses >>>> TieredMergePolicy. I >>>> never close my InderWriter; as I use NRT searching I just alwyas keep it >>>> open. >>>> >>>> My two guesses are that: a) old segments are not removed from disk or b) >>>> deletes are not cleaned up as well as I though they would be. >>>> >>>> I have made a testcase which indexes 5 million rows (five iterations, >>>> five >>>> indexing thread, indexing and deleting all such documents after each >>>> iterator with deleteByQuery), the rows randomly generated. I see the >>>> Taxonomy ever growing (which is logical, because facet-ordinals are >>>> never >>>> removed as far as I understand); the index grows, but also shrinks when >>>> deleting. So I cannot reproduce my problem easily :( >>>> >>>> I will start diving into the Lucene source code, but I was hoping I just >>>> did something wrong. . >>>> >>>> Any hints are appreciated! >>>> >>>> -Rob >>>> >>>> >>>> On Thu, Nov 5, 2015 at 2:52 PM, will <wmartin...@gmail.com> wrote: >>>> >>>> Hi Rob: >>>> >>>>> Do you understand how deletes work and how an index is compacted? >>>>> >>>>> There's some configuration/runtime activities you don't mention.... And >>>>> you make testing process sound like a mirror of production? (Including >>>>> configuration?) >>>>> >>>>> >>>>> -will >>>>> >>>>> >>>>> On 11/5/15 7:33 AM, Rob Audenaerde wrote: >>>>> >>>>> Hi all, >>>>> >>>>>> I'm currently investigating an issue we have with our index. It keeps >>>>>> getting bigger, and I don't het why. >>>>>> >>>>>> Here is our use case: >>>>>> >>>>>> We index a database of about 4 million records; spread over a few >>>>>> hundred >>>>>> tables. The data consists of a mix of text, dates, numbers etc. We >>>>>> also >>>>>> add >>>>>> all these fields as facets. >>>>>> Each night we delete about 90% of the data, which in testing reduces >>>>>> the >>>>>> index size significantly. >>>>>> We store the data as StoredFields as well, to prevent having to access >>>>>> the >>>>>> database at all. >>>>>> We use FloatAssociatedFacet fields for the facets. >>>>>> >>>>>> >>>>>> In production however, it seems the index is only growing, up to 71 GB >>>>>> for >>>>>> these records for a month of running. >>>>>> >>>>>> It seems that lucene's index in just getting bigger there. >>>>>> >>>>>> We use lucene 5.3 on CentOS, java 8 64 bit. >>>>>> >>>>>> The taxonomy-index does not grow significantly. >>>>>> >>>>>> How should I go about checking what is wrong? >>>>>> >>>>>> Thanks! >>>>>> >>>>>> >>>>>> >>>>>> -- >>> Jürgen Albert >>> Geschäftsführer >>> >>> Data In Motion UG (haftungsbeschränkt) >>> >>> Kahlaische Str. 4 >>> 07745 Jena >>> >>> Mobil: 0157-72521634 >>> E-Mail: j.alb...@datainmotion.de >>> Web: www.datainmotion.de >>> >>> XING: https://www.xing.com/profile/Juergen_Albert5 >>> >>> Rechtliches >>> >>> Jena HBR 507027 >>> USt-IdNr: DE274553639 >>> St.Nr.: 162/107/04586 >>> >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >>> >>> > > -- > Jürgen Albert > Geschäftsführer > > Data In Motion UG (haftungsbeschränkt) > > Kahlaische Str. 4 > 07745 Jena > > Mobil: 0157-72521634 > E-Mail: j.alb...@datainmotion.de > Web: www.datainmotion.de > > XING: https://www.xing.com/profile/Juergen_Albert5 > > Rechtliches > > Jena HBR 507027 > USt-IdNr: DE274553639 > St.Nr.: 162/107/04586 > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >