[ https://issues.apache.org/jira/browse/LUCENE-5941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shai Erera updated LUCENE-5941: ------------------------------- Attachment: LUCENE-5941.patch Patch modifies the test to assert up to 3X disk usage. I beasted and it fails with this: {noformat} [beaster] Started J0 PID(5216@SHAIE-TP). [beaster] 2> NOTE: reproduce with: ant test -Dtestcase=TestIndexWriterForceMerge -Dtests.method=testForceMergeTempSpaceUsage -Dtests.seed=AEA68CE694BB7732 -Dtests.slow=true -Dtests.locale=ru_RU -Dtests.timezone=US/Pacific-New -Dtests.file.encoding=Cp1255 [beaster] [09:05:02.935] FAILURE 1.52s | TestIndexWriterForceMerge.testForceMergeTempSpaceUsage <<< [beaster] > Throwable #1: java.lang.AssertionError: forceMerge used too much temporary space: starting usage was 385138 bytes; max temp usage was 1216310 but should have been 1155414 (= 3X starting usage) [beaster] > at __randomizedtesting.SeedInfo.seed([AEA68CE694BB7732:B4644F15FAAB94F5]:0) [beaster] > at org.junit.Assert.fail(Assert.java:93) [beaster] > at org.junit.Assert.assertTrue(Assert.java:43) [beaster] > at org.apache.lucene.index.TestIndexWriterForceMerge.testForceMergeTempSpaceUsage(TestIndexWriterForceMerge.java:162) {noformat} I still haven't dug into this, will do so later. But if anyone has an explanation to why we may consume up to 4X the starting disk usage, please post here. > IndexWriter.forceMerge documentation error > ------------------------------------------ > > Key: LUCENE-5941 > URL: https://issues.apache.org/jira/browse/LUCENE-5941 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index > Reporter: Shai Erera > Assignee: Shai Erera > Attachments: LUCENE-5941.patch > > > IndexWriter.forceMerge documents that it requires up to 3X *FREE* space in > order to run successfully. We even go further with it and test it in > TestIWForceMerge.testForceMergeTempSpaceUsage(). But I think that's wrong. I > cannot think of a situation where we consume 3X *additional* space during > merge: > * 1X - that's the source segments to be merged > * 2X - that's the result non-CFS merged segment > * 3X - that's the CFS creation > At no point do we publish the non-CFS merged segment, therefore the merge, as > I understand it, only consumes up to 2X additional space during that merge. > And anyway, we only require 2X of additional space of the *largest* merge (or > total batch of running merges, depends on your MergeScheduler), not the whole > index size. This is an important observation, since if you e.g. have a 500GB > index, users shouldn't think they need to reserve an additional 1TB for > merging, since most of their big segments won't be merged by default anyway > (TieredMP defaults to 5GB largest segment). > I'll post a patch which fixes the documentation and the test. If anyone can > think of a scenario where we consume up to 3X *additional* space, please > chime, and I'll only modify IW.forceMerge documentation to explain that. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org