[
https://issues.apache.org/jira/browse/SOLR-10375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
David Smiley updated SOLR-10375:
--------------------------------
Affects Version/s: (was: 6.2.1)
Priority: Minor (was: Major)
Description:
Using SolrIndexSearcher.doc(int n, StoredFieldVisitor visitor) (as can happen
with the UnifiedHighlighter in particular)
If the document cache has the document, will call visitFromCached, will get an
out of memory error because of line 752 of SolrIndexSearcher -
visitor.stringField(info, f.stringValue().getBytes(StandardCharsets.UTF_8));
{code}
at java.lang.OutOfMemoryError.<init>()V (OutOfMemoryError.java:48)
at java.lang.StringCoding.encode(Ljava/nio/charset/Charset;[CII)[B
(StringCoding.java:350)
at java.lang.String.getBytes(Ljava/nio/charset/Charset;)[B (String.java:941)
at
org.apache.solr.search.SolrIndexSearcher.visitFromCached(Lorg/apache/lucene/document/Document;Lorg/apache/lucene/index/StoredFieldVisitor;)V
(SolrIndexSearcher.java:685)
at
org.apache.solr.search.SolrIndexSearcher.doc(ILorg/apache/lucene/index/StoredFieldVisitor;)V
(SolrIndexSearcher.java:652)
{code}
This is due to the current String.getBytes(Charset) implementation, which
allocates the underlying byte array as a function of
charArrayLength*maxBytesPerCharacter, which for UTF-8 is 3. 3 * 716MB is over
Integer.MAX, and the JVM cannot allocate over this, so an out of memory
exception is thrown because the allocation of this much memory for a single
array is currently impossible.
The problem is not present when the document cache is disabled.
was:
Using SolrIndexSearcher.doc(int n, StoredFieldVisitor visitor) -
If the document cache has the document, will call visitFromCached, will get an
out of memory error because of line 752 of SolrIndexSearcher -
visitor.stringField(info, f.stringValue().getBytes(StandardCharsets.UTF_8));
{code}
at java.lang.OutOfMemoryError.<init>()V (OutOfMemoryError.java:48)
at java.lang.StringCoding.encode(Ljava/nio/charset/Charset;[CII)[B
(StringCoding.java:350)
at java.lang.String.getBytes(Ljava/nio/charset/Charset;)[B (String.java:941)
at
org.apache.solr.search.SolrIndexSearcher.visitFromCached(Lorg/apache/lucene/document/Document;Lorg/apache/lucene/index/StoredFieldVisitor;)V
(SolrIndexSearcher.java:685)
at
org.apache.solr.search.SolrIndexSearcher.doc(ILorg/apache/lucene/index/StoredFieldVisitor;)V
(SolrIndexSearcher.java:652)
{code}
This is due to the current String.getBytes(Charset) implementation, which
allocates the underlying byte array as a function of
charArrayLength*maxBytesPerCharacter, which for UTF-8 is 3. 3 * 716MB is over
Integer.MAX, and the JVM cannot allocate over this, so an out of memory
exception is thrown because the allocation of this much memory for a single
array is currently impossible.
The problem is not present when the document cache is disabled.
Issue Type: Improvement (was: Bug)
Summary: Stored text retrieved via StoredFieldVisitor on doc in
the document cache over-estimates needed byte[] (was: Stored text > 716MB
retrieval with StoredFieldVisitor causes out of memory error with document
cache)
> Stored text retrieved via StoredFieldVisitor on doc in the document cache
> over-estimates needed byte[]
> ------------------------------------------------------------------------------------------------------
>
> Key: SOLR-10375
> URL: https://issues.apache.org/jira/browse/SOLR-10375
> Project: Solr
> Issue Type: Improvement
> Security Level: Public(Default Security Level. Issues are Public)
> Environment: Java 1.8.121, Linux x64
> Reporter: Michael Braun
> Priority: Minor
>
> Using SolrIndexSearcher.doc(int n, StoredFieldVisitor visitor) (as can
> happen with the UnifiedHighlighter in particular)
> If the document cache has the document, will call visitFromCached, will get
> an out of memory error because of line 752 of SolrIndexSearcher -
> visitor.stringField(info, f.stringValue().getBytes(StandardCharsets.UTF_8));
> {code}
> at java.lang.OutOfMemoryError.<init>()V (OutOfMemoryError.java:48)
> at java.lang.StringCoding.encode(Ljava/nio/charset/Charset;[CII)[B
> (StringCoding.java:350)
> at java.lang.String.getBytes(Ljava/nio/charset/Charset;)[B (String.java:941)
> at
> org.apache.solr.search.SolrIndexSearcher.visitFromCached(Lorg/apache/lucene/document/Document;Lorg/apache/lucene/index/StoredFieldVisitor;)V
> (SolrIndexSearcher.java:685)
> at
> org.apache.solr.search.SolrIndexSearcher.doc(ILorg/apache/lucene/index/StoredFieldVisitor;)V
> (SolrIndexSearcher.java:652)
> {code}
> This is due to the current String.getBytes(Charset) implementation, which
> allocates the underlying byte array as a function of
> charArrayLength*maxBytesPerCharacter, which for UTF-8 is 3. 3 * 716MB is
> over Integer.MAX, and the JVM cannot allocate over this, so an out of memory
> exception is thrown because the allocation of this much memory for a single
> array is currently impossible.
> The problem is not present when the document cache is disabled.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]