Hi,

This seems to be the same error reported by Klemens Friedl last week [1].

I can confirm your findings. After setting the demo application to index the
reuters corpora distributed with CLucene (see my patch to master today), and
setting maxFieldLength to MAX_INT, the applications is failing on one of the
files (for me it was reut2-002.sgm). Call stack points to
DocumentsWriterThreadState.cpp ln 1142, where threadState->p is pointing to
freed or invalid memory.

Unfortunately at the moment I cannot work on tracing this properly. If you
can do this yourself, I'll be happy to assist with whatever I can.

Itamar.

[1] http://comments.gmane.org/gmane.comp.jakarta.lucene.clucene.devel/3449 .
Also see
http://sourceforge.net/tracker/?func=detail&aid=2981449&group_id=80013&atid=
558446.

> -----Original Message-----
> From: Kostka Bořivoj [mailto:kos...@tovek.cz] 
> Sent: Monday, June 21, 2010 2:50 PM
> To: clucene-developers@lists.sourceforge.net
> Subject: [CLucene-dev] vector subscript out of range 
> exception duringindexing
> 
> During indexing set of documents (about 10000 already 
> indexed) I get the exception "vector subscript out of range" 
> from ArrayBase operator [ ].
> I did some research and it seems it is because 
> threadState->postingEquals() method is called with invalid 
> threadState->p set.
> The postingsHash[hashPos] probably contains pointer to 
> already deleted object, as 0xfeee is in all members (I'm 
> running it under MSVC 2005 Debugger).
> See call stack and threadState->p dump below.
> 
> Source (documentswriterthreadstate.cpp:1010)
> ======
> 
>   // Locate Posting in hash
>   threadState->p = postingsHash[hashPos];
> 
>   if (threadState->p != NULL && 
> !threadState->postingEquals(tokenText, tokenTextLen)) { ...
> 
> 
> Call stack
> ========
> clucene-cored.dll!lucene::util::ArrayBase<wchar_t 
> *>::operator[](unsigned int _Pos=0xfffffbbb)  Line 92 C++
> 
> clucene-cored.dll!lucene::index::DocumentsWriter::ThreadState:
> :postingEquals(const wchar_t * tokenText=0x032772a8, const 
> int tokenTextLen=0x00000008)  Line 577 + 0x25 bytes   C++
> 
> clucene-cored.dll!lucene::index::DocumentsWriter::ThreadState:
> :FieldData::addPosition(lucene::analysis::Token * 
> token=0x0100c770)  Line 1012 + 0x26 bytes     C++
> 
> clucene-cored.dll!lucene::index::DocumentsWriter::ThreadState:
> :FieldData::invertField(lucene::document::Field * 
> field=0x04d2a9e0, lucene::analysis::Analyzer * 
> analyzer=0x010a5fa0, const int maxFieldLength=0x00002710)  
> Line 902      C++
> 
> clucene-cored.dll!lucene::index::DocumentsWriter::ThreadState:
> :FieldData::processField(lucene::analysis::Analyzer * 
> analyzer=0x010a5fa0)  Line 797        C++
> 
> clucene-cored.dll!lucene::index::DocumentsWriter::ThreadState:
> :processDocument(lucene::analysis::Analyzer * 
> analyzer=0x010a5fa0)  Line 554 + 0x1a bytes   C++
> 
> clucene-cored.dll!lucene::index::DocumentsWriter::updateDocume
> nt(lucene::document::Document * doc=0x0012f600, 
> lucene::analysis::Analyzer * analyzer=0x010a5fa0, 
> lucene::index::Term * delTerm=0x00000000)  Line 934 + 0xc 
> bytes C++
> 
> clucene-cored.dll!lucene::index::DocumentsWriter::addDocument(
> lucene::document::Document * doc=0x0012f600, 
> lucene::analysis::Analyzer * analyzer=0x010a5fa0)  Line 919   C++
> 
> clucene-cored.dll!lucene::index::IndexWriter::addDocument(luce
> ne::document::Document * doc=0x0012f600, 
> lucene::analysis::Analyzer * analyzer=0x010a5fa0)  Line 670 + 
> 0x13 bytes    C++
> 
> clucene-cored.dll!lucene::index::IndexModifier::addDocument(lu
> cene::document::Document * doc=0x0012f600, 
> lucene::analysis::Analyzer * docAnalyzer=0x010a5fa0)  Line 
> 100   C++
> 
> mkidx.exe!tovek::index::Index::indexDocument(tovek::index::Doc
> ument & doc={...}, bool bInsert=false, unsigned long & 
> ulPrevDoc=0x00000007, tovek::analysis::CachedAnalyzer * 
> pCachedAnalyzer=0x010a5fa0)  Line 472 C++
> 
> 
> Problematic item in PostingHash:
> =========================
> 
> -             threadState->p  0x02538fd8 
> {textStart=0xfeeefeee docFreq=0xfeeefeee freqStart=0xfeeefeee 
> ...}  lucene::index::DocumentsWriter::Posting *
>               textStart       0xfeeefeee      int
>               docFreq 0xfeeefeee      int
>               freqStart       0xfeeefeee      int
>               freqUpto        0xfeeefeee      int
>               proxStart       0xfeeefeee      int
>               proxUpto        0xfeeefeee      int
>               lastDocID       0xfeeefeee      int
>               lastDocCode     0xfeeefeee      int
>               lastPosition    0xfeeefeee      int
> +             vector  0xfeeefeee {p=??? lastOffset=??? 
> offsetStart=??? ...}  lucene::index::DocumentsWriter::PostingVector *
> 
> --------------------------------------------------------------
> ----------------
> ThinkGeek and WIRED's GeekDad team up for the Ultimate 
> GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky 
> parental unit.  See the prize list and enter to win: 
> http://p.sf.net/sfu/thinkgeek-promo
> _______________________________________________
> CLucene-developers mailing list
> CLucene-developers@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/clucene-developers
> 


------------------------------------------------------------------------------
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo
_______________________________________________
CLucene-developers mailing list
CLucene-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/clucene-developers

Reply via email to