Those are the postings array and its staging area for flushing. Once flushed, a Posting object can be deleted.
The code you quoted is originally written in Java as: Arrays.fill(postingsFreeList, postingsFreeCount-numToFree, postingsFreeCount, null); Meaning, this is not a deletion but rather a nullification. This may actually be a proper behavior for Java, since it maintains internal reference counting of all objects. However, it seem to have caused issues with JLucene as well for documents with many terms: https://issues.apache.org/jira/browse/LUCENE-1072. Only question is how come we haven't seen this until now, and whats special with the reuters corpus? I think, if you could port TestDocuemntsWriter to cl_test (at least the relevant test case they have added) and check if it crashes with the same characteristics of your issue, we could verify this is the same issue. Then we can apply their patch (while following the JIRA discussion) accordingly to DocumentsWriter.cpp. Itamar. > -----Original Message----- > From: Kostka Bořivoj [mailto:kos...@tovek.cz] > Sent: Tuesday, June 22, 2010 11:53 PM > To: clucene-developers@lists.sourceforge.net > Subject: Re: [CLucene-dev] vector subscript out of > rangeexceptionduringindexing > > I did some research and found following: > > The problem is caused by freeing cycle in balanceRAM() > (documentswriter.cpp:1325) > > for ( size_t i = > this->postingsFreeCountDW-numToFree;i< > this->postingsFreeListDW.length; i++ ){ > _CLDELETE(this->postingsFreeListDW.values[i]); > } > > Because this->postingsFreeListDW.values contains pointers > which are also used in postingsHash table, the _CLDELETE > makes them invalid. > > So the main question is why Postings objects referenced in > postingsHash are also referenced by postingsFreeListDW. > > Until now I was not able to find the reason. > > Borek > > > > > -----Original Message----- > > From: Itamar Syn-Hershko [mailto:ita...@divrei-tora.com] > > Sent: Monday, June 21, 2010 2:08 PM > > To: clucene-developers@lists.sourceforge.net > > Subject: Re: [CLucene-dev] vector subscript out of range > > exceptionduringindexing > > > > Hi, > > > > This seems to be the same error reported by Klemens Friedl > last week [1]. > > > > I can confirm your findings. After setting the demo application to > > index the reuters corpora distributed with CLucene (see my patch to > > master today), and setting maxFieldLength to MAX_INT, the > applications > > is failing on one of the files (for me it was reut2-002.sgm). Call > > stack points to DocumentsWriterThreadState.cpp ln 1142, where > > threadState->p is pointing to freed or invalid memory. > > > > Unfortunately at the moment I cannot work on tracing this > properly. If > > you can do this yourself, I'll be happy to assist with > whatever I can. > > > > Itamar. > > > > [1] > http://comments.gmane.org/gmane.comp.jakarta.lucene.clucene.de vel/3449 . > Also see > http://sourceforge.net/tracker/?func=detail&aid=2981449&group_id=80013 > &atid= > 558446. > > > -----Original Message----- > > From: Kostka Bořivoj [mailto:kos...@tovek.cz] > > Sent: Monday, June 21, 2010 2:50 PM > > To: clucene-developers@lists.sourceforge.net > > Subject: [CLucene-dev] vector subscript out of range exception > > duringindexing > > > > During indexing set of documents (about 10000 already > > indexed) I get the exception "vector subscript out of range" > > from ArrayBase operator [ ]. > > I did some research and it seems it is because > > threadState->postingEquals() method is called with invalid p set. > > The postingsHash[hashPos] probably contains pointer to already > > deleted object, as 0xfeee is in all members (I'm running it under > > MSVC 2005 Debugger). > > See call stack and threadState->p dump below. > > > > Source (documentswriterthreadstate.cpp:1010) > > ====== > > > > // Locate Posting in hash > > threadState->p = postingsHash[hashPos]; > > > > if (threadState->p != NULL && > > !threadState->postingEquals(tokenText, tokenTextLen)) { ... > > > > > > Call stack > > ======== > > clucene-cored.dll!lucene::util::ArrayBase<wchar_t > > *>::operator[](unsigned int _Pos=0xfffffbbb) Line 92 C++ > > > > clucene-cored.dll!lucene::index::DocumentsWriter::ThreadState: > > :postingEquals(const wchar_t * tokenText=0x032772a8, const > > int tokenTextLen=0x00000008) Line 577 + 0x25 bytes C++ > > > > clucene-cored.dll!lucene::index::DocumentsWriter::ThreadState: > > :FieldData::addPosition(lucene::analysis::Token * > > token=0x0100c770) Line 1012 + 0x26 bytes C++ > > > > clucene-cored.dll!lucene::index::DocumentsWriter::ThreadState: > > :FieldData::invertField(lucene::document::Field * field=0x04d2a9e0, > > lucene::analysis::Analyzer * analyzer=0x010a5fa0, const int > > maxFieldLength=0x00002710) > > Line 902 C++ > > > > clucene-cored.dll!lucene::index::DocumentsWriter::ThreadState: > > :FieldData::processField(lucene::analysis::Analyzer * > > analyzer=0x010a5fa0) Line 797 C++ > > > > clucene-cored.dll!lucene::index::DocumentsWriter::ThreadState: > > :processDocument(lucene::analysis::Analyzer * > > analyzer=0x010a5fa0) Line 554 + 0x1a bytes C++ > > > > clucene-cored.dll!lucene::index::DocumentsWriter::updateDocume > > nt(lucene::document::Document * doc=0x0012f600, > > lucene::analysis::Analyzer * analyzer=0x010a5fa0, > > lucene::index::Term * delTerm=0x00000000) Line 934 + 0xc > > bytes C++ > > > > clucene-cored.dll!lucene::index::DocumentsWriter::addDocument( > > lucene::document::Document * doc=0x0012f600, > > lucene::analysis::Analyzer * analyzer=0x010a5fa0) Line 919 C++ > > > > clucene-cored.dll!lucene::index::IndexWriter::addDocument(luce > > ne::document::Document * doc=0x0012f600, lucene::analysis::Analyzer > > * analyzer=0x010a5fa0) Line 670 + > > 0x13 bytes C++ > > > > clucene-cored.dll!lucene::index::IndexModifier::addDocument(lu > > cene::document::Document * doc=0x0012f600, > > lucene::analysis::Analyzer * docAnalyzer=0x010a5fa0) Line > > 100 C++ > > > > mkidx.exe!tovek::index::Index::indexDocument(tovek::index::Doc > > ument & doc={...}, bool bInsert=false, unsigned long & > > ulPrevDoc=0x00000007, tovek::analysis::CachedAnalyzer * > > pCachedAnalyzer=0x010a5fa0) Line 472 C++ > > > > > > Problematic item in PostingHash: > > ========================= > > > > - threadState->p 0x02538fd8 > > {textStart=0xfeeefeee docFreq=0xfeeefeee freqStart=0xfeeefeee > > ...} lucene::index::DocumentsWriter::Posting * > > textStart 0xfeeefeee int > > docFreq 0xfeeefeee int > > freqStart 0xfeeefeee int > > freqUpto 0xfeeefeee int > > proxStart 0xfeeefeee int > > proxUpto 0xfeeefeee int > > lastDocID 0xfeeefeee int > > lastDocCode 0xfeeefeee int > > lastPosition 0xfeeefeee int > > + vector 0xfeeefeee {p=??? lastOffset=??? > > offsetStart=??? ...} lucene::index::DocumentsWriter::PostingVector * > > > > -------------------------------------------------------------- > > ---------------- > > ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad > > Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. > > See the prize list and enter to win: > > http://p.sf.net/sfu/thinkgeek-promo > > _______________________________________________ > > CLucene-developers mailing list > > CLucene-developers@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/clucene-developers > > > > > > ---------------------------------------------------------------------- > -------- ThinkGeek and WIRED's GeekDad team up for the Ultimate > GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental > unit. See the prize list and enter to win: > http://p.sf.net/sfu/thinkgeek-promo > _______________________________________________ > CLucene-developers mailing list > CLucene-developers@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/clucene-developers ---------------------------------------------------------------------------- -- ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo _______________________________________________ CLucene-developers mailing list CLucene-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/clucene-developers ------------------------------------------------------------------------------ ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo _______________________________________________ CLucene-developers mailing list CLucene-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/clucene-developers