[jira] [Commented] (SOLR-7927) Transaction log consumes lot of memory when indexing large documents

Yonik Seeley (JIRA) Fri, 14 Aug 2015 08:47:09 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-7927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697254#comment-14697254
 ]


Yonik Seeley commented on SOLR-7927:
------------------------------------

Hmmm, the transaction log uses an 8K buffer (IIRC) so the memory use really 
should be limited to serializing it with JavaBin...
Is there a really large string in the document?  Those are serialized to a 
temporary byte buffer of max size:
{code}
    int maxSize = end * 4;
    if (bytes == null || bytes.length < maxSize) bytes = new byte[maxSize];
    int sz = ByteUtils.UTF16toUTF8(s, 0, end, bytes, 0);
{code}

We should have never moved away from the old string format (CESU-8 + number of 
java characters), it's just so much more efficient for Java since no temporary 
byte buffer is needed and you can stream-encode a string directly since you 
already know the length.

This same cost would be paid by a leader forwarding to replicas too.


> Transaction log consumes lot of memory when indexing large documents
> --------------------------------------------------------------------
>
>                 Key: SOLR-7927
>                 URL: https://issues.apache.org/jira/browse/SOLR-7927
>             Project: Solr
>          Issue Type: Bug
>          Components: update
>    Affects Versions: 5.2.1
>            Reporter: Shalin Shekhar Mangar
>             Fix For: Trunk, 5.4
>
>
> Solr is started with 1280M heap.
> ./bin/solr start -m 1280m
> Indexing a 100MB JSON file (using curl) containing large JSON documents from 
> project Gutenberg fails with OOM but indexing a 549M JSON file containing 
> small documents is indexed just fine.
> The same 100MB JSON file with the same heap size can be indexed just fine if 
> I disable the transaction log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-7927) Transaction log consumes lot of memory when indexing large documents

Reply via email to