In IndexWriter.h (line 1163) there are a few functions marked as being for
test purposes only. From what I could tell, they are not being accessed from
anywhere right now.

Your options as I see them are:

* Make them public (I'm not sure how Java gets around that one without doing
this)
* Subclass IndexWriter in the test suite and make them available only under
it
* "Friend" the classes

Decide which to do based on the way JL uses them (apparently we aren't using
them at all at the moment, so don't look at CL for this). If it is possible
to make this code available from within the test suite alone, I'd definitely
preffer to compile those out of the core's IndexWriter. "Friend"ing is
probably not possible to do without putting test code in CL, which as I said
- the core is better left without.

HTH

Itamar.

> -----Original Message-----
> From: Kostka Bořivoj [mailto:kos...@tovek.cz] 
> Sent: Thursday, June 24, 2010 12:22 AM
> To: clucene-developers@lists.sourceforge.net
> Subject: Re: [CLucene-dev] vector subscript out of range 
> exception duringindexing
> 
> I started porting of test but I have problem with 
> private/protected methods. Some JLucene methods are used in 
> tests but marked private in CLucene, e.g.
> 
>     IndexWriter writer = new IndexWriter(dir, analyzer, true);
>     writer.addDocument(testDoc);
>     writer.flush();
>     SegmentInfo info = writer.newestSegment();
> 
> Can be easily ported to 
> 
>     IndexWriter * writer = _CLNEW IndexWriter(dir, analyzer, true);
>     writer->addDocument(&testDoc);
>     writer->flush();
>     SegmentInfo * info = writer->newestSegment();
> 
> But the newestSegment method is private, so test cannot be compiled.
> 
> Any hint how to go around that?
> 
> Borek
> 
> 
> 
> > -----Original Message-----
> > From: Kostka Bořivoj [mailto:kos...@tovek.cz]
> > Sent: Wednesday, June 23, 2010 5:00 PM
> > To: clucene-developers@lists.sourceforge.net
> > Subject: Re: [CLucene-dev] vector subscript out of 
> > rangeexceptionduringindexing
> > 
> > I'll try to port whole TestDocumentsWriter, it is not so big
> > 
> > > -----Original Message-----
> > > From: Itamar Syn-Hershko [mailto:ita...@code972.com]
> > > Sent: Wednesday, June 23, 2010 12:39 PM
> > > To: clucene-developers@lists.sourceforge.net
> > > Subject: Re: [CLucene-dev] vector subscript out of range 
> > > exceptionduringindexing
> > >
> > > Use Java Lucene 2.3.2, which the git master branch is 
> based on. Grab 
> > > it from http://archive.apache.org/dist/lucene/java/, or 
> you can use 
> > > tools like Krugle to read the code on-line.
> > >
> > > You may only need this to port TestDocumentsWriter as a whole. To 
> > > fix this specific issue I think it is enough to follow the patch 
> > > attached to the JIRA issue. I'm not sure it was deployed 
> to the 2.3.2 sources, btw.
> > >
> > > Itamar.
> > >
> > > > -----Original Message-----
> > > > From: Kostka Bořivoj [mailto:kos...@tovek.cz]
> > > > Sent: Wednesday, June 23, 2010 12:10 PM
> > > > To: clucene-developers@lists.sourceforge.net
> > > > Subject: Re: [CLucene-dev] vector subscript out of 
> range exception 
> > > > duringindexing
> > > >
> > > > I'm not sure which JLucene version I should use (and 
> where to get 
> > > > it)
> > > >
> > > > Borek
> > > >
> > > > > -----Original Message-----
> > > > > From: Itamar Syn-Hershko [mailto:ita...@code972.com]
> > > > > Sent: Wednesday, June 23, 2010 12:11 AM
> > > > > To: clucene-developers@lists.sourceforge.net
> > > > > Subject: Re: [CLucene-dev] vector subscript out 
> > > > > ofrangeexceptionduringindexing
> > > > >
> > > > > Those are the postings array and its staging area for
> > > > flushing. Once
> > > > > flushed, a Posting object can be deleted.
> > > > >
> > > > > The code you quoted is originally written in Java as:
> > > > >       Arrays.fill(postingsFreeList, 
> postingsFreeCount-numToFree, 
> > > > > postingsFreeCount, null);
> > > > >
> > > > > Meaning, this is not a deletion but rather a nullification.
> > > > This may
> > > > > actually be a proper behavior for Java, since it maintains 
> > > > > internal reference counting of all objects. However, 
> it seem to 
> > > > > have caused issues with JLucene as well for documents 
> with many terms:
> > > > > https://issues.apache.org/jira/browse/LUCENE-1072. 
> Only question 
> > > > > is how come we haven't seen this until now, and whats special
> > > > with the reuters corpus?
> > > > >
> > > > > I think, if you could port TestDocuemntsWriter to cl_test (at 
> > > > > least the relevant test case they have added) and check if it
> > > > crashes with
> > > > > the same characteristics of your issue, we could 
> verify this is 
> > > > > the same issue. Then we can apply their patch (while 
> following 
> > > > > the JIRA
> > > > > discussion) accordingly to DocumentsWriter.cpp.
> > > > >
> > > > > Itamar.
> > > > >
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Kostka Bořivoj [mailto:kos...@tovek.cz]
> > > > > > Sent: Tuesday, June 22, 2010 11:53 PM
> > > > > > To: clucene-developers@lists.sourceforge.net
> > > > > > Subject: Re: [CLucene-dev] vector subscript out of 
> > > > > > rangeexceptionduringindexing
> > > > > >
> > > > > > I did some research and found following:
> > > > > >
> > > > > > The problem is caused by freeing cycle in balanceRAM()
> > > > > > (documentswriter.cpp:1325)
> > > > > >
> > > > > >         for ( size_t i =
> > > > > > this->postingsFreeCountDW-numToFree;i<
> > > > > > this->postingsFreeListDW.length; i++ ){
> > > > > >           _CLDELETE(this->postingsFreeListDW.values[i]);
> > > > > >         }
> > > > > >
> > > > > > Because this->postingsFreeListDW.values contains pointers
> > > > which are
> > > > > > also used in postingsHash table, the _CLDELETE 
> makes them invalid.
> > > > > >
> > > > > > So the main question is why Postings objects referenced in 
> > > > > > postingsHash are also referenced by postingsFreeListDW.
> > > > > >
> > > > > > Until now I was not able to find the reason.
> > > > > >
> > > > > > Borek
> > > > > >
> > > > > >
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Itamar Syn-Hershko [mailto:ita...@divrei-tora.com]
> > > > > > > Sent: Monday, June 21, 2010 2:08 PM
> > > > > > > To: clucene-developers@lists.sourceforge.net
> > > > > > > Subject: Re: [CLucene-dev] vector subscript out of range 
> > > > > > > exceptionduringindexing
> > > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > This seems to be the same error reported by Klemens Friedl
> > > > > > last week [1].
> > > > > > >
> > > > > > > I can confirm your findings. After setting the demo
> > > > application to
> > > > > > > index the reuters corpora distributed with CLucene (see
> > > > my patch
> > > > > > > to master today), and setting maxFieldLength to 
> MAX_INT, the
> > > > > > applications
> > > > > > > is failing on one of the files (for me it was
> > > > reut2-002.sgm). Call
> > > > > > > stack points to DocumentsWriterThreadState.cpp ln 1142, 
> > > > > > > where
> > > > > > > threadState->p is pointing to freed or invalid memory.
> > > > > > >
> > > > > > > Unfortunately at the moment I cannot work on tracing this
> > > > > > properly. If
> > > > > > > you can do this yourself, I'll be happy to assist with
> > > > > > whatever I can.
> > > > > > >
> > > > > > > Itamar.
> > > > > > >
> > > > > > > [1]
> > > > > > 
> http://comments.gmane.org/gmane.comp.jakarta.lucene.clucene.de
> > > > > vel/3449 .
> > > > > > Also see
> > > > > >
> > > > 
> http://sourceforge.net/tracker/?func=detail&aid=2981449&group_id=8
> > > > 00
> > > > > > 13
> > > > > > &atid=
> > > > > > 558446.
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Kostka Bořivoj [mailto:kos...@tovek.cz]
> > > > > > > Sent: Monday, June 21, 2010 2:50 PM
> > > > > > > To: clucene-developers@lists.sourceforge.net
> > > > > > > Subject: [CLucene-dev] vector subscript out of range 
> > > > > > > exception duringindexing
> > > > > > >
> > > > > > > During indexing set of documents (about 10000 already
> > > > > > > indexed) I get the exception "vector subscript 
> out of range"
> > > > > > > from ArrayBase operator [ ].
> > > > > > > I did some research and it seems it is because
> > > > > > > threadState->postingEquals() method is called with
> > > > invalid p set.
> > > > > > > The postingsHash[hashPos] probably contains pointer to 
> > > > > > > already deleted object, as 0xfeee is in all members (I'm
> > > > running it under
> > > > > > > MSVC 2005 Debugger).
> > > > > > > See call stack and threadState->p dump below.
> > > > > > >
> > > > > > > Source (documentswriterthreadstate.cpp:1010)
> > > > > > > ======
> > > > > > >
> > > > > > >   // Locate Posting in hash
> > > > > > >   threadState->p = postingsHash[hashPos];
> > > > > > >
> > > > > > >   if (threadState->p != NULL && 
> > > > > > > !threadState->postingEquals(tokenText, 
> tokenTextLen)) { ...
> > > > > > >
> > > > > > >
> > > > > > > Call stack
> > > > > > > ========
> > > > > > > clucene-cored.dll!lucene::util::ArrayBase<wchar_t
> > > > > > > *>::operator[](unsigned int _Pos=0xfffffbbb)  
> Line 92       C++
> > > > > > >
> > > > > > > 
> clucene-cored.dll!lucene::index::DocumentsWriter::ThreadState:
> > > > > > > :postingEquals(const wchar_t * tokenText=0x032772a8, const
> > > > > > > int tokenTextLen=0x00000008)  Line 577 + 0x25 
> bytes C++
> > > > > > >
> > > > > > > 
> clucene-cored.dll!lucene::index::DocumentsWriter::ThreadState:
> > > > > > > :FieldData::addPosition(lucene::analysis::Token *
> > > > > > > token=0x0100c770)  Line 1012 + 0x26 bytes C++
> > > > > > >
> > > > > > > 
> clucene-cored.dll!lucene::index::DocumentsWriter::ThreadState:
> > > > > > > :FieldData::invertField(lucene::document::Field * 
> > > > > > > field=0x04d2a9e0, lucene::analysis::Analyzer * 
> > > > > > > analyzer=0x010a5fa0, const int
> > > > > > > maxFieldLength=0x00002710)
> > > > > > > Line 902  C++
> > > > > > >
> > > > > > > 
> clucene-cored.dll!lucene::index::DocumentsWriter::ThreadState:
> > > > > > > :FieldData::processField(lucene::analysis::Analyzer *
> > > > > > > analyzer=0x010a5fa0)  Line 797    C++
> > > > > > >
> > > > > > > 
> clucene-cored.dll!lucene::index::DocumentsWriter::ThreadState:
> > > > > > > :processDocument(lucene::analysis::Analyzer *
> > > > > > > analyzer=0x010a5fa0)  Line 554 + 0x1a bytes       C++
> > > > > > >
> > > > > > > 
> clucene-cored.dll!lucene::index::DocumentsWriter::updateDocu
> > > > > > > me nt(lucene::document::Document * doc=0x0012f600, 
> > > > > > > lucene::analysis::Analyzer * analyzer=0x010a5fa0, 
> > > > > > > lucene::index::Term * delTerm=0x00000000)  Line 934 + 0xc
> > > > > > > bytes     C++
> > > > > > >
> > > > > > > 
> clucene-cored.dll!lucene::index::DocumentsWriter::addDocumen
> > > > > > > t( lucene::document::Document * doc=0x0012f600, 
> > > > > > > lucene::analysis::Analyzer * analyzer=0x010a5fa0)  Line
> > > > 919     C++
> > > > > > >
> > > > > > > 
> clucene-cored.dll!lucene::index::IndexWriter::addDocument(lu
> > > > > > > ce ne::document::Document * doc=0x0012f600, 
> > > > > > > lucene::analysis::Analyzer
> > > > > > > * analyzer=0x010a5fa0)  Line 670 +
> > > > > > > 0x13 bytes        C++
> > > > > > >
> > > > > > > 
> clucene-cored.dll!lucene::index::IndexModifier::addDocument(
> > > > > > > lu cene::document::Document * doc=0x0012f600, 
> > > > > > > lucene::analysis::Analyzer * docAnalyzer=0x010a5fa0)  Line
> > > > > > > 100       C++
> > > > > > >
> > > > > > > 
> mkidx.exe!tovek::index::Index::indexDocument(tovek::index::D
> > > > > > > oc ument & doc={...}, bool bInsert=false, unsigned long & 
> > > > > > > ulPrevDoc=0x00000007, tovek::analysis::CachedAnalyzer *
> > > > > > > pCachedAnalyzer=0x010a5fa0)  Line 472     C++
> > > > > > >
> > > > > > >
> > > > > > > Problematic item in PostingHash:
> > > > > > > =========================
> > > > > > >
> > > > > > > -         threadState->p  0x02538fd8
> > > > > > > {textStart=0xfeeefeee docFreq=0xfeeefeee 
> freqStart=0xfeeefeee
> > > > > > > ...}      lucene::index::DocumentsWriter::Posting *
> > > > > > >           textStart       0xfeeefeee      int
> > > > > > >           docFreq 0xfeeefeee      int
> > > > > > >           freqStart       0xfeeefeee      int
> > > > > > >           freqUpto        0xfeeefeee      int
> > > > > > >           proxStart       0xfeeefeee      int
> > > > > > >           proxUpto        0xfeeefeee      int
> > > > > > >           lastDocID       0xfeeefeee      int
> > > > > > >           lastDocCode     0xfeeefeee      int
> > > > > > >           lastPosition    0xfeeefeee      int
> > > > > > > +         vector  0xfeeefeee {p=??? lastOffset=???
> > > > > > > offsetStart=??? ...}
> > > > > lucene::index::DocumentsWriter::PostingVector *
> > > > > > >
> > > > > > > 
> ------------------------------------------------------------
> > > > > > > --
> > > > > > > ----------------
> > > > > > > ThinkGeek and WIRED's GeekDad team up for the Ultimate 
> > > > > > > GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
> > > > > > > lucky
> > > > parental unit.
> > > > > > > See the prize list and enter to win:
> > > > > > > http://p.sf.net/sfu/thinkgeek-promo
> > > > > > > _______________________________________________
> > > > > > > CLucene-developers mailing list 
> > > > > > > CLucene-developers@lists.sourceforge.net
> > > > > > > 
> https://lists.sourceforge.net/lists/listinfo/clucene-develop
> > > > > > > ers
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > 
> ------------------------------------------------------------------
> > > > --
> > > > > > --
> > > > > > -------- ThinkGeek and WIRED's GeekDad team up for the 
> > > > > > Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE 
> PRIZE to 
> > > > > > the lucky parental unit.  See the prize list and 
> enter to win:
> > > > > > http://p.sf.net/sfu/thinkgeek-promo
> > > > > > _______________________________________________
> > > > > > CLucene-developers mailing list 
> > > > > > CLucene-developers@lists.sourceforge.net
> > > > > > 
> https://lists.sourceforge.net/lists/listinfo/clucene-developer
> > > > > > s
> > > > >
> > > > >
> > > > 
> ------------------------------------------------------------------
> > > > ----
> > > > > ------
> > > > > --
> > > > > ThinkGeek and WIRED's GeekDad team up for the 
> Ultimate GeekDad 
> > > > > Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky
> > > > parental unit.
> > > > > See the prize list and enter to win:
> > > > > http://p.sf.net/sfu/thinkgeek-promo
> > > > > _______________________________________________
> > > > > CLucene-developers mailing list
> > > > > CLucene-developers@lists.sourceforge.net
> > > > > 
> https://lists.sourceforge.net/lists/listinfo/clucene-developers
> > > > >
> > > > >
> > > > >
> > > > 
> ------------------------------------------------------------------
> > > > ----
> > > > > -------- ThinkGeek and WIRED's GeekDad team up for 
> the Ultimate 
> > > > > GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
> > > > lucky parental
> > > > > unit.  See the prize list and enter to win:
> > > > > http://p.sf.net/sfu/thinkgeek-promo
> > > > > _______________________________________________
> > > > > CLucene-developers mailing list
> > > > > CLucene-developers@lists.sourceforge.net
> > > > > 
> https://lists.sourceforge.net/lists/listinfo/clucene-developers
> > > >
> > > > --------------------------------------------------------------
> > > > ----------------
> > > > ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad 
> > > > Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental 
> > > > unit.  See the prize list and enter to win:
> > > > http://p.sf.net/sfu/thinkgeek-promo
> > > > _______________________________________________
> > > > CLucene-developers mailing list
> > > > CLucene-developers@lists.sourceforge.net
> > > > https://lists.sourceforge.net/lists/listinfo/clucene-developers
> > > >
> > >
> > >
> > > 
> --------------------------------------------------------------------
> > > ---------- ThinkGeek and WIRED's GeekDad team up for the Ultimate 
> > > GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky 
> > > parental unit.  See the prize list and enter to win:
> > > http://p.sf.net/sfu/thinkgeek-promo
> > > _______________________________________________
> > > CLucene-developers mailing list
> > > CLucene-developers@lists.sourceforge.net
> > > https://lists.sourceforge.net/lists/listinfo/clucene-developers
> > 
> > 
> ----------------------------------------------------------------------
> > -------- ThinkGeek and WIRED's GeekDad team up for the Ultimate 
> > GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
> lucky parental 
> > unit.  See the prize list and enter to win:
> > http://p.sf.net/sfu/thinkgeek-promo
> > _______________________________________________
> > CLucene-developers mailing list
> > CLucene-developers@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/clucene-developers
> 
> --------------------------------------------------------------
> ----------------
> ThinkGeek and WIRED's GeekDad team up for the Ultimate 
> GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky 
> parental unit.  See the prize list and enter to win: 
> http://p.sf.net/sfu/thinkgeek-promo
> _______________________________________________
> CLucene-developers mailing list
> CLucene-developers@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/clucene-developers
> 


------------------------------------------------------------------------------
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo
_______________________________________________
CLucene-developers mailing list
CLucene-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/clucene-developers

Reply via email to