In IndexWriter.h (line 1163) there are a few functions marked as being for test purposes only. From what I could tell, they are not being accessed from anywhere right now.
Your options as I see them are: * Make them public (I'm not sure how Java gets around that one without doing this) * Subclass IndexWriter in the test suite and make them available only under it * "Friend" the classes Decide which to do based on the way JL uses them (apparently we aren't using them at all at the moment, so don't look at CL for this). If it is possible to make this code available from within the test suite alone, I'd definitely preffer to compile those out of the core's IndexWriter. "Friend"ing is probably not possible to do without putting test code in CL, which as I said - the core is better left without. HTH Itamar. > -----Original Message----- > From: Kostka Bořivoj [mailto:kos...@tovek.cz] > Sent: Thursday, June 24, 2010 12:22 AM > To: clucene-developers@lists.sourceforge.net > Subject: Re: [CLucene-dev] vector subscript out of range > exception duringindexing > > I started porting of test but I have problem with > private/protected methods. Some JLucene methods are used in > tests but marked private in CLucene, e.g. > > IndexWriter writer = new IndexWriter(dir, analyzer, true); > writer.addDocument(testDoc); > writer.flush(); > SegmentInfo info = writer.newestSegment(); > > Can be easily ported to > > IndexWriter * writer = _CLNEW IndexWriter(dir, analyzer, true); > writer->addDocument(&testDoc); > writer->flush(); > SegmentInfo * info = writer->newestSegment(); > > But the newestSegment method is private, so test cannot be compiled. > > Any hint how to go around that? > > Borek > > > > > -----Original Message----- > > From: Kostka Bořivoj [mailto:kos...@tovek.cz] > > Sent: Wednesday, June 23, 2010 5:00 PM > > To: clucene-developers@lists.sourceforge.net > > Subject: Re: [CLucene-dev] vector subscript out of > > rangeexceptionduringindexing > > > > I'll try to port whole TestDocumentsWriter, it is not so big > > > > > -----Original Message----- > > > From: Itamar Syn-Hershko [mailto:ita...@code972.com] > > > Sent: Wednesday, June 23, 2010 12:39 PM > > > To: clucene-developers@lists.sourceforge.net > > > Subject: Re: [CLucene-dev] vector subscript out of range > > > exceptionduringindexing > > > > > > Use Java Lucene 2.3.2, which the git master branch is > based on. Grab > > > it from http://archive.apache.org/dist/lucene/java/, or > you can use > > > tools like Krugle to read the code on-line. > > > > > > You may only need this to port TestDocumentsWriter as a whole. To > > > fix this specific issue I think it is enough to follow the patch > > > attached to the JIRA issue. I'm not sure it was deployed > to the 2.3.2 sources, btw. > > > > > > Itamar. > > > > > > > -----Original Message----- > > > > From: Kostka Bořivoj [mailto:kos...@tovek.cz] > > > > Sent: Wednesday, June 23, 2010 12:10 PM > > > > To: clucene-developers@lists.sourceforge.net > > > > Subject: Re: [CLucene-dev] vector subscript out of > range exception > > > > duringindexing > > > > > > > > I'm not sure which JLucene version I should use (and > where to get > > > > it) > > > > > > > > Borek > > > > > > > > > -----Original Message----- > > > > > From: Itamar Syn-Hershko [mailto:ita...@code972.com] > > > > > Sent: Wednesday, June 23, 2010 12:11 AM > > > > > To: clucene-developers@lists.sourceforge.net > > > > > Subject: Re: [CLucene-dev] vector subscript out > > > > > ofrangeexceptionduringindexing > > > > > > > > > > Those are the postings array and its staging area for > > > > flushing. Once > > > > > flushed, a Posting object can be deleted. > > > > > > > > > > The code you quoted is originally written in Java as: > > > > > Arrays.fill(postingsFreeList, > postingsFreeCount-numToFree, > > > > > postingsFreeCount, null); > > > > > > > > > > Meaning, this is not a deletion but rather a nullification. > > > > This may > > > > > actually be a proper behavior for Java, since it maintains > > > > > internal reference counting of all objects. However, > it seem to > > > > > have caused issues with JLucene as well for documents > with many terms: > > > > > https://issues.apache.org/jira/browse/LUCENE-1072. > Only question > > > > > is how come we haven't seen this until now, and whats special > > > > with the reuters corpus? > > > > > > > > > > I think, if you could port TestDocuemntsWriter to cl_test (at > > > > > least the relevant test case they have added) and check if it > > > > crashes with > > > > > the same characteristics of your issue, we could > verify this is > > > > > the same issue. Then we can apply their patch (while > following > > > > > the JIRA > > > > > discussion) accordingly to DocumentsWriter.cpp. > > > > > > > > > > Itamar. > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > From: Kostka Bořivoj [mailto:kos...@tovek.cz] > > > > > > Sent: Tuesday, June 22, 2010 11:53 PM > > > > > > To: clucene-developers@lists.sourceforge.net > > > > > > Subject: Re: [CLucene-dev] vector subscript out of > > > > > > rangeexceptionduringindexing > > > > > > > > > > > > I did some research and found following: > > > > > > > > > > > > The problem is caused by freeing cycle in balanceRAM() > > > > > > (documentswriter.cpp:1325) > > > > > > > > > > > > for ( size_t i = > > > > > > this->postingsFreeCountDW-numToFree;i< > > > > > > this->postingsFreeListDW.length; i++ ){ > > > > > > _CLDELETE(this->postingsFreeListDW.values[i]); > > > > > > } > > > > > > > > > > > > Because this->postingsFreeListDW.values contains pointers > > > > which are > > > > > > also used in postingsHash table, the _CLDELETE > makes them invalid. > > > > > > > > > > > > So the main question is why Postings objects referenced in > > > > > > postingsHash are also referenced by postingsFreeListDW. > > > > > > > > > > > > Until now I was not able to find the reason. > > > > > > > > > > > > Borek > > > > > > > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > From: Itamar Syn-Hershko [mailto:ita...@divrei-tora.com] > > > > > > > Sent: Monday, June 21, 2010 2:08 PM > > > > > > > To: clucene-developers@lists.sourceforge.net > > > > > > > Subject: Re: [CLucene-dev] vector subscript out of range > > > > > > > exceptionduringindexing > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > This seems to be the same error reported by Klemens Friedl > > > > > > last week [1]. > > > > > > > > > > > > > > I can confirm your findings. After setting the demo > > > > application to > > > > > > > index the reuters corpora distributed with CLucene (see > > > > my patch > > > > > > > to master today), and setting maxFieldLength to > MAX_INT, the > > > > > > applications > > > > > > > is failing on one of the files (for me it was > > > > reut2-002.sgm). Call > > > > > > > stack points to DocumentsWriterThreadState.cpp ln 1142, > > > > > > > where > > > > > > > threadState->p is pointing to freed or invalid memory. > > > > > > > > > > > > > > Unfortunately at the moment I cannot work on tracing this > > > > > > properly. If > > > > > > > you can do this yourself, I'll be happy to assist with > > > > > > whatever I can. > > > > > > > > > > > > > > Itamar. > > > > > > > > > > > > > > [1] > > > > > > > http://comments.gmane.org/gmane.comp.jakarta.lucene.clucene.de > > > > > vel/3449 . > > > > > > Also see > > > > > > > > > > > http://sourceforge.net/tracker/?func=detail&aid=2981449&group_id=8 > > > > 00 > > > > > > 13 > > > > > > &atid= > > > > > > 558446. > > > > > > > > > > > > > -----Original Message----- > > > > > > > From: Kostka Bořivoj [mailto:kos...@tovek.cz] > > > > > > > Sent: Monday, June 21, 2010 2:50 PM > > > > > > > To: clucene-developers@lists.sourceforge.net > > > > > > > Subject: [CLucene-dev] vector subscript out of range > > > > > > > exception duringindexing > > > > > > > > > > > > > > During indexing set of documents (about 10000 already > > > > > > > indexed) I get the exception "vector subscript > out of range" > > > > > > > from ArrayBase operator [ ]. > > > > > > > I did some research and it seems it is because > > > > > > > threadState->postingEquals() method is called with > > > > invalid p set. > > > > > > > The postingsHash[hashPos] probably contains pointer to > > > > > > > already deleted object, as 0xfeee is in all members (I'm > > > > running it under > > > > > > > MSVC 2005 Debugger). > > > > > > > See call stack and threadState->p dump below. > > > > > > > > > > > > > > Source (documentswriterthreadstate.cpp:1010) > > > > > > > ====== > > > > > > > > > > > > > > // Locate Posting in hash > > > > > > > threadState->p = postingsHash[hashPos]; > > > > > > > > > > > > > > if (threadState->p != NULL && > > > > > > > !threadState->postingEquals(tokenText, > tokenTextLen)) { ... > > > > > > > > > > > > > > > > > > > > > Call stack > > > > > > > ======== > > > > > > > clucene-cored.dll!lucene::util::ArrayBase<wchar_t > > > > > > > *>::operator[](unsigned int _Pos=0xfffffbbb) > Line 92 C++ > > > > > > > > > > > > > > > clucene-cored.dll!lucene::index::DocumentsWriter::ThreadState: > > > > > > > :postingEquals(const wchar_t * tokenText=0x032772a8, const > > > > > > > int tokenTextLen=0x00000008) Line 577 + 0x25 > bytes C++ > > > > > > > > > > > > > > > clucene-cored.dll!lucene::index::DocumentsWriter::ThreadState: > > > > > > > :FieldData::addPosition(lucene::analysis::Token * > > > > > > > token=0x0100c770) Line 1012 + 0x26 bytes C++ > > > > > > > > > > > > > > > clucene-cored.dll!lucene::index::DocumentsWriter::ThreadState: > > > > > > > :FieldData::invertField(lucene::document::Field * > > > > > > > field=0x04d2a9e0, lucene::analysis::Analyzer * > > > > > > > analyzer=0x010a5fa0, const int > > > > > > > maxFieldLength=0x00002710) > > > > > > > Line 902 C++ > > > > > > > > > > > > > > > clucene-cored.dll!lucene::index::DocumentsWriter::ThreadState: > > > > > > > :FieldData::processField(lucene::analysis::Analyzer * > > > > > > > analyzer=0x010a5fa0) Line 797 C++ > > > > > > > > > > > > > > > clucene-cored.dll!lucene::index::DocumentsWriter::ThreadState: > > > > > > > :processDocument(lucene::analysis::Analyzer * > > > > > > > analyzer=0x010a5fa0) Line 554 + 0x1a bytes C++ > > > > > > > > > > > > > > > clucene-cored.dll!lucene::index::DocumentsWriter::updateDocu > > > > > > > me nt(lucene::document::Document * doc=0x0012f600, > > > > > > > lucene::analysis::Analyzer * analyzer=0x010a5fa0, > > > > > > > lucene::index::Term * delTerm=0x00000000) Line 934 + 0xc > > > > > > > bytes C++ > > > > > > > > > > > > > > > clucene-cored.dll!lucene::index::DocumentsWriter::addDocumen > > > > > > > t( lucene::document::Document * doc=0x0012f600, > > > > > > > lucene::analysis::Analyzer * analyzer=0x010a5fa0) Line > > > > 919 C++ > > > > > > > > > > > > > > > clucene-cored.dll!lucene::index::IndexWriter::addDocument(lu > > > > > > > ce ne::document::Document * doc=0x0012f600, > > > > > > > lucene::analysis::Analyzer > > > > > > > * analyzer=0x010a5fa0) Line 670 + > > > > > > > 0x13 bytes C++ > > > > > > > > > > > > > > > clucene-cored.dll!lucene::index::IndexModifier::addDocument( > > > > > > > lu cene::document::Document * doc=0x0012f600, > > > > > > > lucene::analysis::Analyzer * docAnalyzer=0x010a5fa0) Line > > > > > > > 100 C++ > > > > > > > > > > > > > > > mkidx.exe!tovek::index::Index::indexDocument(tovek::index::D > > > > > > > oc ument & doc={...}, bool bInsert=false, unsigned long & > > > > > > > ulPrevDoc=0x00000007, tovek::analysis::CachedAnalyzer * > > > > > > > pCachedAnalyzer=0x010a5fa0) Line 472 C++ > > > > > > > > > > > > > > > > > > > > > Problematic item in PostingHash: > > > > > > > ========================= > > > > > > > > > > > > > > - threadState->p 0x02538fd8 > > > > > > > {textStart=0xfeeefeee docFreq=0xfeeefeee > freqStart=0xfeeefeee > > > > > > > ...} lucene::index::DocumentsWriter::Posting * > > > > > > > textStart 0xfeeefeee int > > > > > > > docFreq 0xfeeefeee int > > > > > > > freqStart 0xfeeefeee int > > > > > > > freqUpto 0xfeeefeee int > > > > > > > proxStart 0xfeeefeee int > > > > > > > proxUpto 0xfeeefeee int > > > > > > > lastDocID 0xfeeefeee int > > > > > > > lastDocCode 0xfeeefeee int > > > > > > > lastPosition 0xfeeefeee int > > > > > > > + vector 0xfeeefeee {p=??? lastOffset=??? > > > > > > > offsetStart=??? ...} > > > > > lucene::index::DocumentsWriter::PostingVector * > > > > > > > > > > > > > > > ------------------------------------------------------------ > > > > > > > -- > > > > > > > ---------------- > > > > > > > ThinkGeek and WIRED's GeekDad team up for the Ultimate > > > > > > > GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the > > > > > > > lucky > > > > parental unit. > > > > > > > See the prize list and enter to win: > > > > > > > http://p.sf.net/sfu/thinkgeek-promo > > > > > > > _______________________________________________ > > > > > > > CLucene-developers mailing list > > > > > > > CLucene-developers@lists.sourceforge.net > > > > > > > > https://lists.sourceforge.net/lists/listinfo/clucene-develop > > > > > > > ers > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ------------------------------------------------------------------ > > > > -- > > > > > > -- > > > > > > -------- ThinkGeek and WIRED's GeekDad team up for the > > > > > > Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE > PRIZE to > > > > > > the lucky parental unit. See the prize list and > enter to win: > > > > > > http://p.sf.net/sfu/thinkgeek-promo > > > > > > _______________________________________________ > > > > > > CLucene-developers mailing list > > > > > > CLucene-developers@lists.sourceforge.net > > > > > > > https://lists.sourceforge.net/lists/listinfo/clucene-developer > > > > > > s > > > > > > > > > > > > > > > ------------------------------------------------------------------ > > > > ---- > > > > > ------ > > > > > -- > > > > > ThinkGeek and WIRED's GeekDad team up for the > Ultimate GeekDad > > > > > Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky > > > > parental unit. > > > > > See the prize list and enter to win: > > > > > http://p.sf.net/sfu/thinkgeek-promo > > > > > _______________________________________________ > > > > > CLucene-developers mailing list > > > > > CLucene-developers@lists.sourceforge.net > > > > > > https://lists.sourceforge.net/lists/listinfo/clucene-developers > > > > > > > > > > > > > > > > > > > > ------------------------------------------------------------------ > > > > ---- > > > > > -------- ThinkGeek and WIRED's GeekDad team up for > the Ultimate > > > > > GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the > > > > lucky parental > > > > > unit. See the prize list and enter to win: > > > > > http://p.sf.net/sfu/thinkgeek-promo > > > > > _______________________________________________ > > > > > CLucene-developers mailing list > > > > > CLucene-developers@lists.sourceforge.net > > > > > > https://lists.sourceforge.net/lists/listinfo/clucene-developers > > > > > > > > -------------------------------------------------------------- > > > > ---------------- > > > > ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad > > > > Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental > > > > unit. See the prize list and enter to win: > > > > http://p.sf.net/sfu/thinkgeek-promo > > > > _______________________________________________ > > > > CLucene-developers mailing list > > > > CLucene-developers@lists.sourceforge.net > > > > https://lists.sourceforge.net/lists/listinfo/clucene-developers > > > > > > > > > > > > > > -------------------------------------------------------------------- > > > ---------- ThinkGeek and WIRED's GeekDad team up for the Ultimate > > > GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky > > > parental unit. See the prize list and enter to win: > > > http://p.sf.net/sfu/thinkgeek-promo > > > _______________________________________________ > > > CLucene-developers mailing list > > > CLucene-developers@lists.sourceforge.net > > > https://lists.sourceforge.net/lists/listinfo/clucene-developers > > > > > ---------------------------------------------------------------------- > > -------- ThinkGeek and WIRED's GeekDad team up for the Ultimate > > GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the > lucky parental > > unit. See the prize list and enter to win: > > http://p.sf.net/sfu/thinkgeek-promo > > _______________________________________________ > > CLucene-developers mailing list > > CLucene-developers@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/clucene-developers > > -------------------------------------------------------------- > ---------------- > ThinkGeek and WIRED's GeekDad team up for the Ultimate > GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky > parental unit. See the prize list and enter to win: > http://p.sf.net/sfu/thinkgeek-promo > _______________________________________________ > CLucene-developers mailing list > CLucene-developers@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/clucene-developers > ------------------------------------------------------------------------------ ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo _______________________________________________ CLucene-developers mailing list CLucene-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/clucene-developers