Unclosed readers can definitely cause problems with index size, by preventing the deletion of merged-away segments. lsof can be useful for diagnosing that.
As to the rest, I for one have lost track of what problems you've got with which of your indexes. I suggest you remove the forceMerge call, double check for unclosed readers or anything else hanging on to index files, then post a new message if you've still got problems. -- Ian. On Mon, Jan 19, 2015 at 2:16 PM, Jürgen Albert <j.alb...@data-in-motion.biz> wrote: > Hi, > > Am 19.01.2015 um 14:13 schrieb Uwe Schindler: >> >> Hi, >> >>> we use 4.8.1. We know that the javadoc advises against it. Like I wrote, >>> the >>> deletion of old documents (that appear during an update) would be done >>> while closing the writer. >> >> This is not true. The merge policy continuously merges segments that >> contain deletions. The problem you might have is the following: >> If you call forceMerge(1) for the first time, your index is reduced from a >> well distributed multi-segment index to one single, large segment. If you >> then apply deletes, they are applied against this large segment. Newly added >> documents are added to new segments. Those new segments are small, so they >> are merged with preference. The deletions in the huge single segment are >> very unlikely merged away, because Lucene only touches this segment as a >> large resort. So the problem starts when you call forceMerge for the first >> time! >> >> If you don’t call forceMerge and continuously index, you deletions will be >> removed quite fast. This is especially true if the deletions are >> well-distributed over the whole index! There are tons of instances with >> Elasticsearch and Lucene doing this all the time. They never ever close >> their writer. Be sure to use TieredMergePolicy (the default), because this >> one prefers segments that have many deletions. The old LogMergePolicy does >> not respect deletes, but should no longer be used, unless you rely on a >> specific index order of your documents. > > We use the default, which is the TieredMergePolicy as far as I can see. If > what you write is true, I wonder why our index started growing in the first > place. We have 2 indices, where the bigger one receives an update on every > document every couple of days and a smaller one where every document is > updated randomly over a period of roughly 3 minutes. After a couple of days, > the indices became 12 GB each (the bigger one started with 2 GB and the > smaller one with a couple of Megabytes). This should not happen if the > MergePolicy works as intended. Can unclosed readers cause such a problem. We > use a SearchManager to avoid this, but there can always be the possibility. > > On the other hand we have the case I initially described. We have a fresh > index, that we populate. No reader is opened and no additional updates have > been made. Therefore I see no reason why forceMerge triples the size of the > index at all. >>> >>> Unfortunately we can't close the writer and we >>> chose the force merge as alternative with less afford. Could >>> forceMergeDeletes serve our purpose here? >> >> It could, but has the same problem like above. The only difference to >> forceMerge is that it only merges segments which have deletions. >> >>> I will take a look into it with lsof, but I'm pretty sure, the files will >>> be held by >>> some javaprocess. >>> >>> Jürgen. >>> >>> Am 19.01.2015 um 13:36 schrieb Ian Lea: >>>> >>>> Do you need to call forceMerge(1) at all? The javadoc, certainly for >>>> recent versions of lucene, advises against it. What version of lucene >>>> are you running? >>>> >>>> It might be helpful to run lsof against the index directory >>>> before/during/after the merge to see what files are coming or going, >>>> or if there are any marked as deleted but still present. That would >>>> imply that something, somewhere, was holding on to the files. >>>> >>>> >>>> -- >>>> Ian. >>>> >>>> >>>> On Fri, Jan 16, 2015 at 1:57 PM, Jürgen Albert >>>> <j.alb...@data-in-motion.biz> wrote: >>>>> >>>>> Hi, >>>>> >>>>> because we have constant updates on our index, we can't really close >>>>> the index from time to time. Therefore we decided to trigger >>>>> forceMerge when the traffic is lowest, the clean up. >>>>> >>>>> On our development laptops (Windows and Linux) it works as expected, >>>>> but on the real Servers we have some wired behaviour. >>>>> >>>>> Scenario: >>>>> >>>>> We create a fresh index and populate it. This results in an index >>>>> with a size of 2 GB. If we rigger forceMerge(1) and a commit() >>>>> afterwards for this index, the index grows over the next 10 minutes >>>>> to 6 GB and does not shrink back. During the whole process no reader is >>> >>> opened on the index. >>>>> >>>>> If I try the same stunt with the same data on my Windows Laptop, it >>>>> does nothing at all and finishes after a few ms. >>>>> >>>>> Any Ideas? >>>>> >>>>> Technical details: >>>>> We use an MMapDirectory and the Server is a Debian7 Kernel 3.2 in a >>>>> KVM. The file system is Ext4. >>>>> >>>>> Thx, >>>>> >>>>> Jürgen Albert. >>>>> >>>>> -- >>>>> Jürgen Albert >>>>> Geschäftsführer >>>>> >>>>> Data In Motion UG (haftungsbeschränkt) >>>>> >>>>> Kahlaische Str. 4 >>>>> 07745 Jena >>>>> >>>>> Mobil: 0157-72521634 >>>>> E-Mail: j.alb...@datainmotion.de >>>>> Web: www.datainmotion.de >>>>> >>>>> XING: https://www.xing.com/profile/Juergen_Albert5 >>>>> >>>>> Rechtliches >>>>> >>>>> Jena HBR 507027 >>>>> USt-IdNr: DE274553639 >>>>> St.Nr.: 162/107/04586 >>>>> >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>> >>> >>> -- >>> Jürgen Albert >>> Geschäftsführer >>> >>> Data In Motion UG (haftungsbeschränkt) >>> >>> Kahlaische Str. 4 >>> 07745 Jena >>> >>> Mobil: 0157-72521634 >>> E-Mail: j.alb...@datainmotion.de >>> Web: www.datainmotion.de >>> >>> XING: https://www.xing.com/profile/Juergen_Albert5 >>> >>> Rechtliches >>> >>> Jena HBR 507027 >>> USt-IdNr: DE274553639 >>> St.Nr.: 162/107/04586 >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> > > > -- > Jürgen Albert > Geschäftsführer > > Data In Motion UG (haftungsbeschränkt) > > Kahlaische Str. 4 > 07745 Jena > > Mobil: 0157-72521634 > E-Mail: j.alb...@datainmotion.de > Web: www.datainmotion.de > > XING: https://www.xing.com/profile/Juergen_Albert5 > > Rechtliches > > Jena HBR 507027 > USt-IdNr: DE274553639 > St.Nr.: 162/107/04586 > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org