[ 
https://issues.apache.org/jira/browse/LUCENE-5941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14132554#comment-14132554
 ] 

Shai Erera commented on LUCENE-5941:
------------------------------------

bq. Well i think its related to windows.

The only thing that's related to Windows is the inability to delete a file, 
i.e. you get an exception. But even on Unix, that deleted file still consumes 
disk space until the last process that holds it open releases it. And that's 
true not just for merges, even for newly flushed segments - they're first 
flushed non-CFS'd, then we pack then in CFS. We can fail to delete these temp 
files there too.

I try to distinguish between when *Lucene* consumes more disk space 
deliberately, vs when your OS/app does. Your app can hold open a Reader on 
every commit point and never close them, and therefore we will always fail to 
delete the files. We don't care about that.

But when does *Lucene* consume 2/3/4X disk space on purpose?? I get the 3X: 
existing segments + temp non-CFS merged segment + CFS merged segment. But I 
don't get the 4X ... and consequently still didn't figure out why the test 
fails.

> IndexWriter.forceMerge documentation error
> ------------------------------------------
>
>                 Key: LUCENE-5941
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5941
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/index
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>         Attachments: LUCENE-5941.patch
>
>
> IndexWriter.forceMerge documents that it requires up to 3X *FREE* space in 
> order to run successfully. We even go further with it and test it in 
> TestIWForceMerge.testForceMergeTempSpaceUsage(). But I think that's wrong. I 
> cannot think of a situation where we consume 3X *additional* space during 
> merge:
> * 1X - that's the source segments to be merged
> * 2X - that's the result non-CFS merged segment
> * 3X - that's the CFS creation
> At no point do we publish the non-CFS merged segment, therefore the merge, as 
> I understand it, only consumes up to 2X additional space during that merge.
> And anyway, we only require 2X of additional space of the *largest* merge (or 
> total batch of running merges, depends on your MergeScheduler), not the whole 
> index size. This is an important observation, since if you e.g. have a 500GB 
> index, users shouldn't think they need to reserve an additional 1TB for 
> merging, since most of their big segments won't be merged by default anyway 
> (TieredMP defaults to 5GB largest segment).
> I'll post a patch which fixes the documentation and the test. If anyone can 
> think of a scenario where we consume up to 3X *additional* space, please 
> chime, and I'll only modify IW.forceMerge documentation to explain that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to