[jira] [Updated] (SOLR-10375) Stored text retrieved via StoredFieldVisitor on doc in the document cache over-estimates needed byte[]

David Smiley (JIRA) Wed, 05 Apr 2017 13:06:54 -0700

     [ 
https://issues.apache.org/jira/browse/SOLR-10375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


David Smiley updated SOLR-10375:
--------------------------------
    Affects Version/s:     (was: 6.2.1)
             Priority: Minor  (was: Major)
          Description: 
Using SolrIndexSearcher.doc(int n, StoredFieldVisitor visitor)   (as can happen 
with the UnifiedHighlighter in particular)

If the document cache has the document, will call visitFromCached, will get an 
out of memory error because of line 752 of SolrIndexSearcher - 
visitor.stringField(info, f.stringValue().getBytes(StandardCharsets.UTF_8));

{code}
 at java.lang.OutOfMemoryError.<init>()V (OutOfMemoryError.java:48)
  at java.lang.StringCoding.encode(Ljava/nio/charset/Charset;[CII)[B 
(StringCoding.java:350)
  at java.lang.String.getBytes(Ljava/nio/charset/Charset;)[B (String.java:941)
  at 
org.apache.solr.search.SolrIndexSearcher.visitFromCached(Lorg/apache/lucene/document/Document;Lorg/apache/lucene/index/StoredFieldVisitor;)V
 (SolrIndexSearcher.java:685)
  at 
org.apache.solr.search.SolrIndexSearcher.doc(ILorg/apache/lucene/index/StoredFieldVisitor;)V
 (SolrIndexSearcher.java:652)
{code}

This is due to the current String.getBytes(Charset) implementation, which 
allocates the underlying byte array as a function of 
charArrayLength*maxBytesPerCharacter, which for UTF-8 is 3.  3 * 716MB is over 
Integer.MAX, and the JVM cannot allocate over this, so an out of memory 
exception is thrown because the allocation of this much memory for a single 
array is currently impossible.

The problem is not present when the document cache is disabled.

  was:
Using SolrIndexSearcher.doc(int n, StoredFieldVisitor visitor) - 

If the document cache has the document, will call visitFromCached, will get an 
out of memory error because of line 752 of SolrIndexSearcher - 
visitor.stringField(info, f.stringValue().getBytes(StandardCharsets.UTF_8));

{code}
 at java.lang.OutOfMemoryError.<init>()V (OutOfMemoryError.java:48)
  at java.lang.StringCoding.encode(Ljava/nio/charset/Charset;[CII)[B 
(StringCoding.java:350)
  at java.lang.String.getBytes(Ljava/nio/charset/Charset;)[B (String.java:941)
  at 
org.apache.solr.search.SolrIndexSearcher.visitFromCached(Lorg/apache/lucene/document/Document;Lorg/apache/lucene/index/StoredFieldVisitor;)V
 (SolrIndexSearcher.java:685)
  at 
org.apache.solr.search.SolrIndexSearcher.doc(ILorg/apache/lucene/index/StoredFieldVisitor;)V
 (SolrIndexSearcher.java:652)
{code}

This is due to the current String.getBytes(Charset) implementation, which 
allocates the underlying byte array as a function of 
charArrayLength*maxBytesPerCharacter, which for UTF-8 is 3.  3 * 716MB is over 
Integer.MAX, and the JVM cannot allocate over this, so an out of memory 
exception is thrown because the allocation of this much memory for a single 
array is currently impossible.

The problem is not present when the document cache is disabled.

           Issue Type: Improvement  (was: Bug)
              Summary: Stored text retrieved via StoredFieldVisitor on doc in 
the document cache over-estimates needed byte[]  (was: Stored text > 716MB 
retrieval with StoredFieldVisitor causes out of memory error with document 
cache)

> Stored text retrieved via StoredFieldVisitor on doc in the document cache 
> over-estimates needed byte[]
> ------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-10375
>                 URL: https://issues.apache.org/jira/browse/SOLR-10375
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>         Environment: Java 1.8.121, Linux x64
>            Reporter: Michael Braun
>            Priority: Minor
>
> Using SolrIndexSearcher.doc(int n, StoredFieldVisitor visitor)   (as can 
> happen with the UnifiedHighlighter in particular)
> If the document cache has the document, will call visitFromCached, will get 
> an out of memory error because of line 752 of SolrIndexSearcher - 
> visitor.stringField(info, f.stringValue().getBytes(StandardCharsets.UTF_8));
> {code}
>  at java.lang.OutOfMemoryError.<init>()V (OutOfMemoryError.java:48)
>   at java.lang.StringCoding.encode(Ljava/nio/charset/Charset;[CII)[B 
> (StringCoding.java:350)
>   at java.lang.String.getBytes(Ljava/nio/charset/Charset;)[B (String.java:941)
>   at 
> org.apache.solr.search.SolrIndexSearcher.visitFromCached(Lorg/apache/lucene/document/Document;Lorg/apache/lucene/index/StoredFieldVisitor;)V
>  (SolrIndexSearcher.java:685)
>   at 
> org.apache.solr.search.SolrIndexSearcher.doc(ILorg/apache/lucene/index/StoredFieldVisitor;)V
>  (SolrIndexSearcher.java:652)
> {code}
> This is due to the current String.getBytes(Charset) implementation, which 
> allocates the underlying byte array as a function of 
> charArrayLength*maxBytesPerCharacter, which for UTF-8 is 3.  3 * 716MB is 
> over Integer.MAX, and the JVM cannot allocate over this, so an out of memory 
> exception is thrown because the allocation of this much memory for a single 
> array is currently impossible.
> The problem is not present when the document cache is disabled.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SOLR-10375) Stored text retrieved via StoredFieldVisitor on doc in the document cache over-estimates needed byte[]

Reply via email to