Hi, This seems to be the same error reported by Klemens Friedl last week [1].
I can confirm your findings. After setting the demo application to index the reuters corpora distributed with CLucene (see my patch to master today), and setting maxFieldLength to MAX_INT, the applications is failing on one of the files (for me it was reut2-002.sgm). Call stack points to DocumentsWriterThreadState.cpp ln 1142, where threadState->p is pointing to freed or invalid memory. Unfortunately at the moment I cannot work on tracing this properly. If you can do this yourself, I'll be happy to assist with whatever I can. Itamar. [1] http://comments.gmane.org/gmane.comp.jakarta.lucene.clucene.devel/3449 . Also see http://sourceforge.net/tracker/?func=detail&aid=2981449&group_id=80013&atid= 558446. > -----Original Message----- > From: Kostka Bořivoj [mailto:kos...@tovek.cz] > Sent: Monday, June 21, 2010 2:50 PM > To: clucene-developers@lists.sourceforge.net > Subject: [CLucene-dev] vector subscript out of range > exception duringindexing > > During indexing set of documents (about 10000 already > indexed) I get the exception "vector subscript out of range" > from ArrayBase operator [ ]. > I did some research and it seems it is because > threadState->postingEquals() method is called with invalid > threadState->p set. > The postingsHash[hashPos] probably contains pointer to > already deleted object, as 0xfeee is in all members (I'm > running it under MSVC 2005 Debugger). > See call stack and threadState->p dump below. > > Source (documentswriterthreadstate.cpp:1010) > ====== > > // Locate Posting in hash > threadState->p = postingsHash[hashPos]; > > if (threadState->p != NULL && > !threadState->postingEquals(tokenText, tokenTextLen)) { ... > > > Call stack > ======== > clucene-cored.dll!lucene::util::ArrayBase<wchar_t > *>::operator[](unsigned int _Pos=0xfffffbbb) Line 92 C++ > > clucene-cored.dll!lucene::index::DocumentsWriter::ThreadState: > :postingEquals(const wchar_t * tokenText=0x032772a8, const > int tokenTextLen=0x00000008) Line 577 + 0x25 bytes C++ > > clucene-cored.dll!lucene::index::DocumentsWriter::ThreadState: > :FieldData::addPosition(lucene::analysis::Token * > token=0x0100c770) Line 1012 + 0x26 bytes C++ > > clucene-cored.dll!lucene::index::DocumentsWriter::ThreadState: > :FieldData::invertField(lucene::document::Field * > field=0x04d2a9e0, lucene::analysis::Analyzer * > analyzer=0x010a5fa0, const int maxFieldLength=0x00002710) > Line 902 C++ > > clucene-cored.dll!lucene::index::DocumentsWriter::ThreadState: > :FieldData::processField(lucene::analysis::Analyzer * > analyzer=0x010a5fa0) Line 797 C++ > > clucene-cored.dll!lucene::index::DocumentsWriter::ThreadState: > :processDocument(lucene::analysis::Analyzer * > analyzer=0x010a5fa0) Line 554 + 0x1a bytes C++ > > clucene-cored.dll!lucene::index::DocumentsWriter::updateDocume > nt(lucene::document::Document * doc=0x0012f600, > lucene::analysis::Analyzer * analyzer=0x010a5fa0, > lucene::index::Term * delTerm=0x00000000) Line 934 + 0xc > bytes C++ > > clucene-cored.dll!lucene::index::DocumentsWriter::addDocument( > lucene::document::Document * doc=0x0012f600, > lucene::analysis::Analyzer * analyzer=0x010a5fa0) Line 919 C++ > > clucene-cored.dll!lucene::index::IndexWriter::addDocument(luce > ne::document::Document * doc=0x0012f600, > lucene::analysis::Analyzer * analyzer=0x010a5fa0) Line 670 + > 0x13 bytes C++ > > clucene-cored.dll!lucene::index::IndexModifier::addDocument(lu > cene::document::Document * doc=0x0012f600, > lucene::analysis::Analyzer * docAnalyzer=0x010a5fa0) Line > 100 C++ > > mkidx.exe!tovek::index::Index::indexDocument(tovek::index::Doc > ument & doc={...}, bool bInsert=false, unsigned long & > ulPrevDoc=0x00000007, tovek::analysis::CachedAnalyzer * > pCachedAnalyzer=0x010a5fa0) Line 472 C++ > > > Problematic item in PostingHash: > ========================= > > - threadState->p 0x02538fd8 > {textStart=0xfeeefeee docFreq=0xfeeefeee freqStart=0xfeeefeee > ...} lucene::index::DocumentsWriter::Posting * > textStart 0xfeeefeee int > docFreq 0xfeeefeee int > freqStart 0xfeeefeee int > freqUpto 0xfeeefeee int > proxStart 0xfeeefeee int > proxUpto 0xfeeefeee int > lastDocID 0xfeeefeee int > lastDocCode 0xfeeefeee int > lastPosition 0xfeeefeee int > + vector 0xfeeefeee {p=??? lastOffset=??? > offsetStart=??? ...} lucene::index::DocumentsWriter::PostingVector * > > -------------------------------------------------------------- > ---------------- > ThinkGeek and WIRED's GeekDad team up for the Ultimate > GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky > parental unit. See the prize list and enter to win: > http://p.sf.net/sfu/thinkgeek-promo > _______________________________________________ > CLucene-developers mailing list > CLucene-developers@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/clucene-developers > ------------------------------------------------------------------------------ ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo _______________________________________________ CLucene-developers mailing list CLucene-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/clucene-developers