[ 
https://issues.apache.org/jira/browse/LUCENE-6779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14729779#comment-14729779
 ] 

Shalin Shekhar Mangar commented on LUCENE-6779:
-----------------------------------------------

bq. Isn't this a mix of two things (buffering and coding)? I think it'd be 
nicer to have the DataOutput (or some decorator) take care of the buffering 
aspects and the routine could then focus on transcoding from UTF16 to UTF8.

Yes but that actually has better performance than writing bytes directly to the 
DataOutput. I tested this with JavaBinCodec and I don't think performance will 
be very different here (see JMH benchmark results in SOLR-7971). Presumably, 
the huge amount of invocations of writeByte don't perform well compared to 
setting a byte in a scratch array directly.

> Reduce memory allocated by CompressingStoredFieldsWriter to write large 
> strings
> -------------------------------------------------------------------------------
>
>                 Key: LUCENE-6779
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6779
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/codecs
>            Reporter: Shalin Shekhar Mangar
>         Attachments: LUCENE-6779.patch
>
>
> In SOLR-7927, I am trying to reduce the memory required to index very large 
> documents (between 10 to 100MB) and one of the places which allocate a lot of 
> heap is the UTF8 encoding in CompressingStoredFieldsWriter. The same problem 
> existed in JavaBinCodec and we reduced its memory allocation by falling back 
> to a double pass approach in SOLR-7971 when the utf8 size of the string is 
> greater than 64KB.
> I propose to make the same changes to CompressingStoredFieldsWriter as we 
> made to JavaBinCodec in SOLR-7971.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to