I did some research and found following:
The problem is caused by freeing cycle in balanceRAM()
(documentswriter.cpp:1325)
for ( size_t i = this->postingsFreeCountDW-numToFree;i<
this->postingsFreeListDW.length; i++ ){
_CLDELETE(this->postingsFreeListDW.values[i]);
}
Because this->postingsFreeListDW.values contains pointers which are also used
in postingsHash
table, the _CLDELETE makes them invalid.
So the main question is why Postings objects referenced in postingsHash are
also referenced by
postingsFreeListDW.
Until now I was not able to find the reason.
Borek
> -----Original Message-----
> From: Itamar Syn-Hershko [mailto:[email protected]]
> Sent: Monday, June 21, 2010 2:08 PM
> To: [email protected]
> Subject: Re: [CLucene-dev] vector subscript out of range
> exceptionduringindexing
>
> Hi,
>
> This seems to be the same error reported by Klemens Friedl last week [1].
>
> I can confirm your findings. After setting the demo application to index the
> reuters corpora distributed with CLucene (see my patch to master today), and
> setting maxFieldLength to MAX_INT, the applications is failing on one of the
> files (for me it was reut2-002.sgm). Call stack points to
> DocumentsWriterThreadState.cpp ln 1142, where threadState->p is pointing to
> freed or invalid memory.
>
> Unfortunately at the moment I cannot work on tracing this properly. If you
> can do this yourself, I'll be happy to assist with whatever I can.
>
> Itamar.
>
> [1] http://comments.gmane.org/gmane.comp.jakarta.lucene.clucene.devel/3449 .
> Also see
> http://sourceforge.net/tracker/?func=detail&aid=2981449&group_id=80013&atid=
> 558446.
>
> > -----Original Message-----
> > From: Kostka Bořivoj [mailto:[email protected]]
> > Sent: Monday, June 21, 2010 2:50 PM
> > To: [email protected]
> > Subject: [CLucene-dev] vector subscript out of range
> > exception duringindexing
> >
> > During indexing set of documents (about 10000 already
> > indexed) I get the exception "vector subscript out of range"
> > from ArrayBase operator [ ].
> > I did some research and it seems it is because
> > threadState->postingEquals() method is called with invalid
> > threadState->p set.
> > The postingsHash[hashPos] probably contains pointer to
> > already deleted object, as 0xfeee is in all members (I'm
> > running it under MSVC 2005 Debugger).
> > See call stack and threadState->p dump below.
> >
> > Source (documentswriterthreadstate.cpp:1010)
> > ======
> >
> > // Locate Posting in hash
> > threadState->p = postingsHash[hashPos];
> >
> > if (threadState->p != NULL &&
> > !threadState->postingEquals(tokenText, tokenTextLen)) { ...
> >
> >
> > Call stack
> > ========
> > clucene-cored.dll!lucene::util::ArrayBase<wchar_t
> > *>::operator[](unsigned int _Pos=0xfffffbbb) Line 92 C++
> >
> > clucene-cored.dll!lucene::index::DocumentsWriter::ThreadState:
> > :postingEquals(const wchar_t * tokenText=0x032772a8, const
> > int tokenTextLen=0x00000008) Line 577 + 0x25 bytes C++
> >
> > clucene-cored.dll!lucene::index::DocumentsWriter::ThreadState:
> > :FieldData::addPosition(lucene::analysis::Token *
> > token=0x0100c770) Line 1012 + 0x26 bytes C++
> >
> > clucene-cored.dll!lucene::index::DocumentsWriter::ThreadState:
> > :FieldData::invertField(lucene::document::Field *
> > field=0x04d2a9e0, lucene::analysis::Analyzer *
> > analyzer=0x010a5fa0, const int maxFieldLength=0x00002710)
> > Line 902 C++
> >
> > clucene-cored.dll!lucene::index::DocumentsWriter::ThreadState:
> > :FieldData::processField(lucene::analysis::Analyzer *
> > analyzer=0x010a5fa0) Line 797 C++
> >
> > clucene-cored.dll!lucene::index::DocumentsWriter::ThreadState:
> > :processDocument(lucene::analysis::Analyzer *
> > analyzer=0x010a5fa0) Line 554 + 0x1a bytes C++
> >
> > clucene-cored.dll!lucene::index::DocumentsWriter::updateDocume
> > nt(lucene::document::Document * doc=0x0012f600,
> > lucene::analysis::Analyzer * analyzer=0x010a5fa0,
> > lucene::index::Term * delTerm=0x00000000) Line 934 + 0xc
> > bytes C++
> >
> > clucene-cored.dll!lucene::index::DocumentsWriter::addDocument(
> > lucene::document::Document * doc=0x0012f600,
> > lucene::analysis::Analyzer * analyzer=0x010a5fa0) Line 919 C++
> >
> > clucene-cored.dll!lucene::index::IndexWriter::addDocument(luce
> > ne::document::Document * doc=0x0012f600,
> > lucene::analysis::Analyzer * analyzer=0x010a5fa0) Line 670 +
> > 0x13 bytes C++
> >
> > clucene-cored.dll!lucene::index::IndexModifier::addDocument(lu
> > cene::document::Document * doc=0x0012f600,
> > lucene::analysis::Analyzer * docAnalyzer=0x010a5fa0) Line
> > 100 C++
> >
> > mkidx.exe!tovek::index::Index::indexDocument(tovek::index::Doc
> > ument & doc={...}, bool bInsert=false, unsigned long &
> > ulPrevDoc=0x00000007, tovek::analysis::CachedAnalyzer *
> > pCachedAnalyzer=0x010a5fa0) Line 472 C++
> >
> >
> > Problematic item in PostingHash:
> > =========================
> >
> > - threadState->p 0x02538fd8
> > {textStart=0xfeeefeee docFreq=0xfeeefeee freqStart=0xfeeefeee
> > ...} lucene::index::DocumentsWriter::Posting *
> > textStart 0xfeeefeee int
> > docFreq 0xfeeefeee int
> > freqStart 0xfeeefeee int
> > freqUpto 0xfeeefeee int
> > proxStart 0xfeeefeee int
> > proxUpto 0xfeeefeee int
> > lastDocID 0xfeeefeee int
> > lastDocCode 0xfeeefeee int
> > lastPosition 0xfeeefeee int
> > + vector 0xfeeefeee {p=??? lastOffset=???
> > offsetStart=??? ...} lucene::index::DocumentsWriter::PostingVector *
> >
> > --------------------------------------------------------------
> > ----------------
> > ThinkGeek and WIRED's GeekDad team up for the Ultimate
> > GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky
> > parental unit. See the prize list and enter to win:
> > http://p.sf.net/sfu/thinkgeek-promo
> > _______________________________________________
> > CLucene-developers mailing list
> > [email protected]
> > https://lists.sourceforge.net/lists/listinfo/clucene-developers
> >
>
>
>
> ------------------------------------------------------------------------------
> ThinkGeek and WIRED's GeekDad team up for the Ultimate
> GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
> lucky parental unit. See the prize list and enter to win:
> http://p.sf.net/sfu/thinkgeek-promo
> _______________________________________________
> CLucene-developers mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/clucene-developers
------------------------------------------------------------------------------
ThinkGeek and WIRED's GeekDad team up for the Ultimate
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
lucky parental unit. See the prize list and enter to win:
http://p.sf.net/sfu/thinkgeek-promo
_______________________________________________
CLucene-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/clucene-developers