I'm not sure which JLucene version I should use (and where to get it)

Borek

> -----Original Message-----
> From: Itamar Syn-Hershko [mailto:ita...@code972.com]
> Sent: Wednesday, June 23, 2010 12:11 AM
> To: clucene-developers@lists.sourceforge.net
> Subject: Re: [CLucene-dev] vector subscript out ofrangeexceptionduringindexing
> 
> Those are the postings array and its staging area for flushing. Once
> flushed, a Posting object can be deleted.
> 
> The code you quoted is originally written in Java as:
>       Arrays.fill(postingsFreeList, postingsFreeCount-numToFree,
> postingsFreeCount, null);
> 
> Meaning, this is not a deletion but rather a nullification. This may
> actually be a proper behavior for Java, since it maintains internal
> reference counting of all objects. However, it seem to have caused issues
> with JLucene as well for documents with many terms:
> https://issues.apache.org/jira/browse/LUCENE-1072. Only question is how come
> we haven't seen this until now, and whats special with the reuters corpus?
> 
> I think, if you could port TestDocuemntsWriter to cl_test (at least the
> relevant test case they have added) and check if it crashes with the same
> characteristics of your issue, we could verify this is the same issue. Then
> we can apply their patch (while following the JIRA discussion) accordingly
> to DocumentsWriter.cpp.
> 
> Itamar.
> 
> 
> > -----Original Message-----
> > From: Kostka Bořivoj [mailto:kos...@tovek.cz]
> > Sent: Tuesday, June 22, 2010 11:53 PM
> > To: clucene-developers@lists.sourceforge.net
> > Subject: Re: [CLucene-dev] vector subscript out of
> > rangeexceptionduringindexing
> >
> > I did some research and found following:
> >
> > The problem is caused by freeing cycle in balanceRAM()
> > (documentswriter.cpp:1325)
> >
> >         for ( size_t i =
> > this->postingsFreeCountDW-numToFree;i<
> > this->postingsFreeListDW.length; i++ ){
> >           _CLDELETE(this->postingsFreeListDW.values[i]);
> >         }
> >
> > Because this->postingsFreeListDW.values contains pointers
> > which are also used in postingsHash table, the _CLDELETE
> > makes them invalid.
> >
> > So the main question is why Postings objects referenced in
> > postingsHash are also referenced by postingsFreeListDW.
> >
> > Until now I was not able to find the reason.
> >
> > Borek
> >
> >
> >
> > > -----Original Message-----
> > > From: Itamar Syn-Hershko [mailto:ita...@divrei-tora.com]
> > > Sent: Monday, June 21, 2010 2:08 PM
> > > To: clucene-developers@lists.sourceforge.net
> > > Subject: Re: [CLucene-dev] vector subscript out of range
> > > exceptionduringindexing
> > >
> > > Hi,
> > >
> > > This seems to be the same error reported by Klemens Friedl
> > last week [1].
> > >
> > > I can confirm your findings. After setting the demo application to
> > > index the reuters corpora distributed with CLucene (see my patch to
> > > master today), and setting maxFieldLength to MAX_INT, the
> > applications
> > > is failing on one of the files (for me it was reut2-002.sgm). Call
> > > stack points to DocumentsWriterThreadState.cpp ln 1142, where
> > > threadState->p is pointing to freed or invalid memory.
> > >
> > > Unfortunately at the moment I cannot work on tracing this
> > properly. If
> > > you can do this yourself, I'll be happy to assist with
> > whatever I can.
> > >
> > > Itamar.
> > >
> > > [1]
> > http://comments.gmane.org/gmane.comp.jakarta.lucene.clucene.de
> vel/3449 .
> > Also see
> > http://sourceforge.net/tracker/?func=detail&aid=2981449&group_id=80013
> > &atid=
> > 558446.
> >
> > > -----Original Message-----
> > > From: Kostka Bořivoj [mailto:kos...@tovek.cz]
> > > Sent: Monday, June 21, 2010 2:50 PM
> > > To: clucene-developers@lists.sourceforge.net
> > > Subject: [CLucene-dev] vector subscript out of range exception
> > > duringindexing
> > >
> > > During indexing set of documents (about 10000 already
> > > indexed) I get the exception "vector subscript out of range"
> > > from ArrayBase operator [ ].
> > > I did some research and it seems it is because
> > > threadState->postingEquals() method is called with invalid p set.
> > > The postingsHash[hashPos] probably contains pointer to already
> > > deleted object, as 0xfeee is in all members (I'm running it under
> > > MSVC 2005 Debugger).
> > > See call stack and threadState->p dump below.
> > >
> > > Source (documentswriterthreadstate.cpp:1010)
> > > ======
> > >
> > >   // Locate Posting in hash
> > >   threadState->p = postingsHash[hashPos];
> > >
> > >   if (threadState->p != NULL &&
> > > !threadState->postingEquals(tokenText, tokenTextLen)) { ...
> > >
> > >
> > > Call stack
> > > ========
> > > clucene-cored.dll!lucene::util::ArrayBase<wchar_t
> > > *>::operator[](unsigned int _Pos=0xfffffbbb)  Line 92     C++
> > >
> > > clucene-cored.dll!lucene::index::DocumentsWriter::ThreadState:
> > > :postingEquals(const wchar_t * tokenText=0x032772a8, const
> > > int tokenTextLen=0x00000008)  Line 577 + 0x25 bytes       C++
> > >
> > > clucene-cored.dll!lucene::index::DocumentsWriter::ThreadState:
> > > :FieldData::addPosition(lucene::analysis::Token *
> > > token=0x0100c770)  Line 1012 + 0x26 bytes C++
> > >
> > > clucene-cored.dll!lucene::index::DocumentsWriter::ThreadState:
> > > :FieldData::invertField(lucene::document::Field * field=0x04d2a9e0,
> > > lucene::analysis::Analyzer * analyzer=0x010a5fa0, const int
> > > maxFieldLength=0x00002710)
> > > Line 902  C++
> > >
> > > clucene-cored.dll!lucene::index::DocumentsWriter::ThreadState:
> > > :FieldData::processField(lucene::analysis::Analyzer *
> > > analyzer=0x010a5fa0)  Line 797    C++
> > >
> > > clucene-cored.dll!lucene::index::DocumentsWriter::ThreadState:
> > > :processDocument(lucene::analysis::Analyzer *
> > > analyzer=0x010a5fa0)  Line 554 + 0x1a bytes       C++
> > >
> > > clucene-cored.dll!lucene::index::DocumentsWriter::updateDocume
> > > nt(lucene::document::Document * doc=0x0012f600,
> > > lucene::analysis::Analyzer * analyzer=0x010a5fa0,
> > > lucene::index::Term * delTerm=0x00000000)  Line 934 + 0xc
> > > bytes     C++
> > >
> > > clucene-cored.dll!lucene::index::DocumentsWriter::addDocument(
> > > lucene::document::Document * doc=0x0012f600,
> > > lucene::analysis::Analyzer * analyzer=0x010a5fa0)  Line 919       C++
> > >
> > > clucene-cored.dll!lucene::index::IndexWriter::addDocument(luce
> > > ne::document::Document * doc=0x0012f600, lucene::analysis::Analyzer
> > > * analyzer=0x010a5fa0)  Line 670 +
> > > 0x13 bytes        C++
> > >
> > > clucene-cored.dll!lucene::index::IndexModifier::addDocument(lu
> > > cene::document::Document * doc=0x0012f600,
> > > lucene::analysis::Analyzer * docAnalyzer=0x010a5fa0)  Line
> > > 100       C++
> > >
> > > mkidx.exe!tovek::index::Index::indexDocument(tovek::index::Doc
> > > ument & doc={...}, bool bInsert=false, unsigned long &
> > > ulPrevDoc=0x00000007, tovek::analysis::CachedAnalyzer *
> > > pCachedAnalyzer=0x010a5fa0)  Line 472     C++
> > >
> > >
> > > Problematic item in PostingHash:
> > > =========================
> > >
> > > -         threadState->p  0x02538fd8
> > > {textStart=0xfeeefeee docFreq=0xfeeefeee freqStart=0xfeeefeee
> > > ...}      lucene::index::DocumentsWriter::Posting *
> > >           textStart       0xfeeefeee      int
> > >           docFreq 0xfeeefeee      int
> > >           freqStart       0xfeeefeee      int
> > >           freqUpto        0xfeeefeee      int
> > >           proxStart       0xfeeefeee      int
> > >           proxUpto        0xfeeefeee      int
> > >           lastDocID       0xfeeefeee      int
> > >           lastDocCode     0xfeeefeee      int
> > >           lastPosition    0xfeeefeee      int
> > > +         vector  0xfeeefeee {p=??? lastOffset=???
> > > offsetStart=??? ...}
> lucene::index::DocumentsWriter::PostingVector *
> > >
> > > --------------------------------------------------------------
> > > ----------------
> > > ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad
> > > Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit.
> > > See the prize list and enter to win:
> > > http://p.sf.net/sfu/thinkgeek-promo
> > > _______________________________________________
> > > CLucene-developers mailing list
> > > CLucene-developers@lists.sourceforge.net
> > > https://lists.sourceforge.net/lists/listinfo/clucene-developers
> > >
> >
> >
> >
> > ----------------------------------------------------------------------
> > -------- ThinkGeek and WIRED's GeekDad team up for the Ultimate
> > GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental
> > unit.  See the prize list and enter to win:
> > http://p.sf.net/sfu/thinkgeek-promo
> > _______________________________________________
> > CLucene-developers mailing list
> > CLucene-developers@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/clucene-developers
> 
> ----------------------------------------------------------------------------
> --
> ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day
> Giveaway. ONE MASSIVE PRIZE to the lucky parental unit.  See the prize list
> and enter to win:
> http://p.sf.net/sfu/thinkgeek-promo
> _______________________________________________
> CLucene-developers mailing list
> CLucene-developers@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/clucene-developers
> 
> 
> ------------------------------------------------------------------------------
> ThinkGeek and WIRED's GeekDad team up for the Ultimate
> GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
> lucky parental unit.  See the prize list and enter to win:
> http://p.sf.net/sfu/thinkgeek-promo
> _______________________________________________
> CLucene-developers mailing list
> CLucene-developers@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/clucene-developers

------------------------------------------------------------------------------
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo
_______________________________________________
CLucene-developers mailing list
CLucene-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/clucene-developers

Reply via email to