Hi Dat, We have an entity called 'record' which contain a record id, a table name and a set of values. When we insert records to our data layer, we index those records by the id and the values. Indexing is done in a separate thread. I ll explain how this done. When we insert records to data layer, we insert records as blobs into underlying data source (if it is rdbms it will be blobs) and also we insert another record to another table (that index-record contains all the record ids which need to be indexed). The separate thread which performs the indexing task, extract the so called indexing record and extract all the record ids in it, which need to be indexed. There can be several index-records also. What we did earlier was, extract all the index-records currently we have then extract the record ids in them and index them using lucene. We had performance tests, unit tests they all passed. Then we changed our implementation to use iterators to extract these records since keeping all the records in a List can cause OOM issues. Now the tests are passing except facets indexing. I know it will not be easy to understand the context of the problem. I have mentioned our source at [1]. When we used the method at line number 312 instead of the method at line number 330, we get the above error. Note that method is used at line number 422.
[1] https://github.com/gimantha/carbon-analytics/blob/master/components/analytics-core/org.wso2.carbon.analytics.dataservice/src/main/java/org/wso2/carbon/analytics/dataservice/indexing/AnalyticsDataIndexer.java On Sun, Jun 14, 2015 at 7:13 PM, Đạt Cao Mạnh <caomanhdat...@gmail.com> wrote: > Can you post you scenario in detail along with your modification please? > > On 14:09, Sun, 14 Jun 2015 Gimantha Bandara <giman...@wso2.com> wrote: > >> Hi Dat, >> >> I can reproduce this behavior even with like 50000 records. Is what you >> said the only reason that make this exception occur? >> >> Thanks, >> >> On Sat, Jun 13, 2015 at 5:40 AM, Đạt Cao Mạnh <caomanhdat...@gmail.com> >> wrote: >> >>> Hi, the total number of documents in an index of lucene is >>> Integer.MAX_VALUE. So using a single lucene index to index billions >>> documents is not a proper ways. You should consider using Solr Cloud or >>> Elasticsearch to index your documents. >>> >>> On 19:43, Fri, 12 Jun 2015 Gimantha Bandara <giman...@wso2.com> wrote: >>> >>> > Hi all, >>> > >>> > We are using Lucene 4.10.3 for indexing. Recently we changed our >>> > implementation so that we give data batchwise to lucene to index. >>> Earlier >>> > we just query all the data from the data source and index all data at >>> > once. It works well. But the number of entries can be up to billions. >>> So >>> > getting all the data entries from the data source causes OutOfMemory >>> > sometimes. So we changed the implementation to So that Lucene indexes >>> the >>> > data batchwise. Now we are getting the following exception. Can anyone >>> tell >>> > me what that exception means? >>> > >>> > java.lang.ArrayIndexOutOfBoundsException: 147 >>> > at >>> > >>> > >>> org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnum.advance(Lucene41PostingsReader.java:538) >>> > at org.apache.lucene.search.TermScorer.advance(TermScorer.java:85) >>> > at >>> > >>> > >>> org.apache.lucene.search.ConjunctionScorer.doNext(ConjunctionScorer.java:82) >>> > at >>> > >>> > >>> org.apache.lucene.search.ConjunctionScorer.nextDoc(ConjunctionScorer.java:100) >>> > at >>> > >>> org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:192) >>> > at >>> > >>> org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:163) >>> > at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:35) >>> > at >>> > org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:621) >>> > at >>> > org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:309) >>> > at >>> > >>> org.apache.lucene.facet.FacetsCollector.doSearch(FacetsCollector.java:294) >>> > at >>> > >>> org.apache.lucene.facet.FacetsCollector.search(FacetsCollector.java:198) >>> > >>> > >>> > -- >>> > Gimantha Bandara >>> > Software Engineer >>> > WSO2. Inc : http://wso2.com >>> > Mobile : +94714961919 >>> > >>> >> >> >> >> -- >> Gimantha Bandara >> Software Engineer >> WSO2. Inc : http://wso2.com >> Mobile : +94714961919 >> > -- Gimantha Bandara Software Engineer WSO2. Inc : http://wso2.com Mobile : +94714961919