Steve Chen created LUCENE-7538:
----------------------------------

             Summary: Uploading large text file to a field causes "this 
IndexWriter is closed" error
                 Key: LUCENE-7538
                 URL: https://issues.apache.org/jira/browse/LUCENE-7538
             Project: Lucene - Core
          Issue Type: Bug
    Affects Versions: 5.5.1
            Reporter: Steve Chen


We have seen "this IndexWriter is closed" error after we tried to upload a 
large text file to a single Solr text field. The field definition in the 
schema.xml is:
{noformat}
<field name="fileContent" type="text_general" indexed="true" stored="true" 
termVectors="true" termPositions="true" termOffsets="true"/>
{noformat}

After that, the IndexWriter remained closed and couldn't be recovered until we 
reloaded the Solr core.  The text file had size of 800MB, containing only 
numbers and English characters.

Stack trace is shown below:
{noformat}
2016-11-02 23:00:17,913 [http-nio-19082-exec-3] ERROR 
org.apache.solr.handler.RequestHandlerBase - 
org.apache.solr.common.SolrException: Exception writing document id 1487_0_1 to 
the index; possible analysis error.
        at 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:180)
        at 
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:68)
        at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
        at 
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:934)
        at 
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1089)
        at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:712)
        at 
org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103)
        at 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.doAdd(ExtractingDocumentLoader.java:126)
        at 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.addDoc(ExtractingDocumentLoader.java:131)
        at 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:237)
        at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:69)
        at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:155)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:2082)
        at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:651)
        at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:458)
        at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:229)
        at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:184)
        at 
veeva.ecm.common.interfaces.web.SolrDispatchOverride.doFilter(SolrDispatchOverride.java:43)
        at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239)
        at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:212)
        at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:106)
        at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:141)
        at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:79)
        at 
org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:616)
        at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88)
        at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:521)
        at 
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1096)
        at 
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:674)
        at 
org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1500)
        at 
org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1456)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at 
org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.lucene.store.AlreadyClosedException: this IndexWriter is 
closed
        at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:720)
        at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:734)
        at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1473)
        at 
org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:282)
        at 
org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:214)
        at 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:169)
        ... 34 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 56
        at org.apache.lucene.util.UnicodeUtil.UTF16toUTF8(UnicodeUtil.java:201)
        at org.apache.lucene.util.UnicodeUtil.UTF16toUTF8(UnicodeUtil.java:183)
        at 
org.apache.lucene.codecs.compressing.GrowableByteArrayDataOutput.writeString(GrowableByteArrayDataOutput.java:72)
        at 
org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.writeField(CompressingStoredFieldsWriter.java:292)
        at 
org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:382)
        at 
org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:321)
        at 
org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:234)
        at 
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:450)
        at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1477)
        ... 37 more

{noformat}

We debugged and traced down the issue.  It was an integer overflow problem that 
was not properly handled.  The GrowableByteArrayDataOutput::writeString(String 
string) method is shown below:
{noformat}
@Override
  public void writeString(String string) throws IOException {
    int maxLen = string.length() * UnicodeUtil.MAX_UTF8_BYTES_PER_CHAR;
    if (maxLen <= MIN_UTF8_SIZE_TO_ENABLE_DOUBLE_PASS_ENCODING)  {
      // string is small enough that we don't need to save memory by falling 
back to double-pass approach
      // this is just an optimized writeString() that re-uses scratchBytes.
      scratchBytes = ArrayUtil.grow(scratchBytes, maxLen);
      int len = UnicodeUtil.UTF16toUTF8(string, 0, string.length(), 
scratchBytes);
      writeVInt(len);
      writeBytes(scratchBytes, len);
    } else  {
      // use a double pass approach to avoid allocating a large intermediate 
buffer for string encoding
      int numBytes = UnicodeUtil.calcUTF16toUTF8Length(string, 0, 
string.length());
      writeVInt(numBytes);
      bytes = ArrayUtil.grow(bytes, length + numBytes);
      length = UnicodeUtil.UTF16toUTF8(string, 0, string.length(), bytes, 
length);
    }
  }
{noformat}
The 800MB text file stored in the string parameter of the method had a length 
of 800 million, the maxLen became negative integer as the result of the length 
times 3. The negative integer was then passed into ArrayUtil.grow(scratchBytes, 
maxLen):
{noformat}
public static byte[] grow(byte[] array, int minSize) {
    assert minSize >= 0: "size must be positive (got " + minSize + "): likely 
integer overflow?";
    if (array.length < minSize) {
      byte[] newArray = new byte[oversize(minSize, 1)];
      System.arraycopy(array, 0, newArray, 0, array.length);
      return newArray;
    } else
      return array;
  }
{noformat}
Assertion was disabled in production so the execution won't stop. The original 
array was returned from the method call without increasing the size, which 
caused an ArrayIndexOutOfBoundsException to be thrown.  The 
ArrayIndexOutOfBoundsException was wrapped into AbortingException, and later on 
caused the IndexWriter to be closed in IndexWriter class.

The code should fail faster with a more-specific error for the integer overflow 
problem, and shouldn't cause the IndexWriter to be closed.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to