[ 
https://issues.apache.org/jira/browse/SOLR-7927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14700390#comment-14700390
 ] 

Yonik Seeley commented on SOLR-7927:
------------------------------------

bq. For example, JavaBinCodec.writeStr creates a byte array of size 4 * 
string.length but the same can be done in 3 * string.length

Hmmm, that code (the change from CESU8 & number-of-java-chars to UTF8) was done 
Mr Unicode Policeman, so it's interesting if it's wrong.
I don't know myself if there are any 16 bit patterns (i.e. it may not be valid 
UTF16) that blows up to 4 bytes when encoded as UTF8... the unicode replacement 
character is only 3 bytes too, so I can't find anything that would result in 4x.

> Transaction log consumes lot of memory when indexing large documents
> --------------------------------------------------------------------
>
>                 Key: SOLR-7927
>                 URL: https://issues.apache.org/jira/browse/SOLR-7927
>             Project: Solr
>          Issue Type: Bug
>          Components: update
>    Affects Versions: 5.2.1
>            Reporter: Shalin Shekhar Mangar
>             Fix For: Trunk, 5.4
>
>
> Solr is started with 1280M heap.
> ./bin/solr start -m 1280m
> Indexing a 100MB JSON file (using curl) containing large JSON documents from 
> project Gutenberg fails with OOM but indexing a 549M JSON file containing 
> small documents is indexed just fine.
> The same 100MB JSON file with the same heap size can be indexed just fine if 
> I disable the transaction log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to