[
https://issues.apache.org/jira/browse/LUCENE-6779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shalin Shekhar Mangar updated LUCENE-6779:
------------------------------------------
Attachment: LUCENE-6779.patch
Patch with the changes.
> Reduce memory allocated by CompressingStoredFieldsWriter to write large
> strings
> -------------------------------------------------------------------------------
>
> Key: LUCENE-6779
> URL: https://issues.apache.org/jira/browse/LUCENE-6779
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/codecs
> Reporter: Shalin Shekhar Mangar
> Attachments: LUCENE-6779.patch
>
>
> In SOLR-7927, I am trying to reduce the memory required to index very large
> documents (between 10 to 100MB) and one of the places which allocate a lot of
> heap is the UTF8 encoding in CompressingStoredFieldsWriter. The same problem
> existed in JavaBinCodec and we reduced its memory allocation by falling back
> to a double pass approach in SOLR-7971 when the utf8 size of the string is
> greater than 64KB.
> I propose to make the same changes to CompressingStoredFieldsWriter as we
> made to JavaBinCodec in SOLR-7971.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]