[ https://issues.apache.org/jira/browse/LUCENE-8118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312971#comment-16312971 ]
Diego Ceccarelli edited comment on LUCENE-8118 at 1/5/18 11:53 AM: ------------------------------------------------------------------- Looking at your code it seems that there is only one commit at the end, and your collection is big. What if you try to commit every, let's say, 50k docs? was (Author: diegoceccarelli): Looking at your code it seems that there is only one commit at the end, and your collection is big. Could you please try to commit every, let's say, 50k docs? > ArrayIndexOutOfBoundsException in TermsHashPerField.writeByte during indexing > ----------------------------------------------------------------------------- > > Key: LUCENE-8118 > URL: https://issues.apache.org/jira/browse/LUCENE-8118 > Project: Lucene - Core > Issue Type: Bug > Components: core/index > Affects Versions: 7.2 > Environment: Debian/Stretch > java version "1.8.0_144" > > Java(TM) SE Runtime > Environment (build 1.8.0_144-b01) > > Java HotSpot(TM) 64-Bit Server VM (build > 25.144-b01, mixed mode) > Reporter: Laura Dietz > > Indexing a large collection of about 20 million paragraph-sized documents > results in an ArrayIndexOutOfBoundsException in > org.apache.lucene.index.TermsHashPerField.writeByte (full stack trace > below). > The bug is possibly related to issues described in > [here|http://lucene.472066.n3.nabble.com/ArrayIndexOutOfBoundsException-65536-td3661945.html] > and [SOLR-10936|https://issues.apache.org/jira/browse/SOLR-10936] -- but I > am not using SOLR, I am directly using Lucene Core. > The issue can be reproduced using code from [GitHub > trec-car-tools-example|https://github.com/TREMA-UNH/trec-car-tools/tree/lucene-bug/trec-car-tools-example] > > - compile with `mvn compile assembly:single` > - run with `java -cp > ./target/treccar-tools-example-0.1-jar-with-dependencies.jar > edu.unh.cs.TrecCarBuildLuceneIndex paragraphs paragraphCorpus.cbor indexDir` > Where paragraphCorpus.cbor is contained in this > [archive|http://trec-car.cs.unh.edu/datareleases/v2.0-snapshot/archive-paragraphCorpus.tar.xz] > Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -65536 > at > org.apache.lucene.index.TermsHashPerField.writeByte(TermsHashPerField.java:198) > > at > org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:224) > > at > org.apache.lucene.index.FreqProxTermsWriterPerField.addTerm(FreqProxTermsWriterPerField.java:159) > > at > org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:185) > > at > org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:786) > > at > org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:430) > > at > org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:392) > > at > org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:281) > > at > org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:451) > > at > org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1532) > > at > org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1508) > at > edu.unh.cs.TrecCarBuildLuceneIndex.main(TrecCarBuildLuceneIndex.java:55) -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org