[ 
https://issues.apache.org/jira/browse/LUCENE-6779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-6779:
--------------------------------
    Attachment: LUCENE-6779_alt.patch

here is my prototype just to show you what i mean. This one just removes the 
extra buffer entirely.

So my suggestion would be, just benchmark/optimize/test this 
GrowableByteArrayDataOutput.writeString() itself.

If it needs an extra buffer added back to speed up small strings, then lets add 
it back *here*. I also think this thing does not need to be in a .util package, 
since its only used by .compressing package. Lets move it there, and let it 
make appropriate tradeoffs specific to writing this data for CompressingXXX. 

And test the hell out of it with unit tests if the logic must be any fancier.


> Reduce memory allocated by CompressingStoredFieldsWriter to write large 
> strings
> -------------------------------------------------------------------------------
>
>                 Key: LUCENE-6779
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6779
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/codecs
>            Reporter: Shalin Shekhar Mangar
>         Attachments: LUCENE-6779.patch, LUCENE-6779_alt.patch
>
>
> In SOLR-7927, I am trying to reduce the memory required to index very large 
> documents (between 10 to 100MB) and one of the places which allocate a lot of 
> heap is the UTF8 encoding in CompressingStoredFieldsWriter. The same problem 
> existed in JavaBinCodec and we reduced its memory allocation by falling back 
> to a double pass approach in SOLR-7971 when the utf8 size of the string is 
> greater than 64KB.
> I propose to make the same changes to CompressingStoredFieldsWriter as we 
> made to JavaBinCodec in SOLR-7971.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to