[ 
https://issues.apache.org/jira/browse/LUCENE-6383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14390318#comment-14390318
 ] 

Robert Muir commented on LUCENE-6383:
-------------------------------------

In this case merging creates a bigger index. 300 KB of segments becomes a 450KB 
single segment. So its not the same problem exactly...

> MemoryPostings fst encoding can be surprisingly inefficient (especially in 
> tests, with payloads)
> ------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-6383
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6383
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Robert Muir
>
> I just worked around this in 2 nightly OOM fails.
> One was TestDuelingCodecs, the other was TestIndexWriterForceMerge's space 
> usage test.
> In general the trend is the same, it seems the more documents you merge, you 
> just get bigger and bigger FST outputs and the size of this PF in ram and on 
> disk grows in a way you don't expect. E.g. merging 300KB of segments resulted 
> in 450KB single segment, and memory usage gets absurdly high.
> The issue seems especially aggravated in tests, when MockAnalyzer adds lots 
> of payloads.
> Maybe it should encode the postings data in a more efficient way? Can it just 
> use a Long output pointing into a RAMFile or something? Or maybe there is 
> just a crazy bug?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to