I don't understand how subclassing can help, as the member in base class is private, so it isn't accesible even for children.
I'm not a friend of Friend classes (it seems to me an uggly technique which breaks encapsulation) and it also needs changes to DocumentWriter. So the only way I see is to change method to be public. I'm not very happy doing so, but I cannot see any other way... Borek > -----Original Message----- > From: Itamar Syn-Hershko [mailto:ita...@code972.com] > Sent: Thursday, June 24, 2010 12:11 AM > To: clucene-developers@lists.sourceforge.net > Subject: Re: [CLucene-dev] vector subscript out of range > exceptionduringindexing > > In IndexWriter.h (line 1163) there are a few functions marked as being for > test purposes only. From what I could tell, they are not being accessed from > anywhere right now. > > Your options as I see them are: > > * Make them public (I'm not sure how Java gets around that one without doing > this) > * Subclass IndexWriter in the test suite and make them available only under > it > * "Friend" the classes > > Decide which to do based on the way JL uses them (apparently we aren't using > them at all at the moment, so don't look at CL for this). If it is possible > to make this code available from within the test suite alone, I'd definitely > preffer to compile those out of the core's IndexWriter. "Friend"ing is > probably not possible to do without putting test code in CL, which as I said > - the core is better left without. > > HTH > > Itamar. > > > -----Original Message----- > > From: Kostka Bořivoj [mailto:kos...@tovek.cz] > > Sent: Thursday, June 24, 2010 12:22 AM > > To: clucene-developers@lists.sourceforge.net > > Subject: Re: [CLucene-dev] vector subscript out of range > > exception duringindexing > > > > I started porting of test but I have problem with > > private/protected methods. Some JLucene methods are used in > > tests but marked private in CLucene, e.g. > > > > IndexWriter writer = new IndexWriter(dir, analyzer, true); > > writer.addDocument(testDoc); > > writer.flush(); > > SegmentInfo info = writer.newestSegment(); > > > > Can be easily ported to > > > > IndexWriter * writer = _CLNEW IndexWriter(dir, analyzer, true); > > writer->addDocument(&testDoc); > > writer->flush(); > > SegmentInfo * info = writer->newestSegment(); > > > > But the newestSegment method is private, so test cannot be compiled. > > > > Any hint how to go around that? > > > > Borek > > > > > > > > > -----Original Message----- > > > From: Kostka Bořivoj [mailto:kos...@tovek.cz] > > > Sent: Wednesday, June 23, 2010 5:00 PM > > > To: clucene-developers@lists.sourceforge.net > > > Subject: Re: [CLucene-dev] vector subscript out of > > > rangeexceptionduringindexing > > > > > > I'll try to port whole TestDocumentsWriter, it is not so big > > > > > > > -----Original Message----- > > > > From: Itamar Syn-Hershko [mailto:ita...@code972.com] > > > > Sent: Wednesday, June 23, 2010 12:39 PM > > > > To: clucene-developers@lists.sourceforge.net > > > > Subject: Re: [CLucene-dev] vector subscript out of range > > > > exceptionduringindexing > > > > > > > > Use Java Lucene 2.3.2, which the git master branch is > > based on. Grab > > > > it from http://archive.apache.org/dist/lucene/java/, or > > you can use > > > > tools like Krugle to read the code on-line. > > > > > > > > You may only need this to port TestDocumentsWriter as a whole. To > > > > fix this specific issue I think it is enough to follow the patch > > > > attached to the JIRA issue. I'm not sure it was deployed > > to the 2.3.2 sources, btw. > > > > > > > > Itamar. > > > > > > > > > -----Original Message----- > > > > > From: Kostka Bořivoj [mailto:kos...@tovek.cz] > > > > > Sent: Wednesday, June 23, 2010 12:10 PM > > > > > To: clucene-developers@lists.sourceforge.net > > > > > Subject: Re: [CLucene-dev] vector subscript out of > > range exception > > > > > duringindexing > > > > > > > > > > I'm not sure which JLucene version I should use (and > > where to get > > > > > it) > > > > > > > > > > Borek > > > > > > > > > > > -----Original Message----- > > > > > > From: Itamar Syn-Hershko [mailto:ita...@code972.com] > > > > > > Sent: Wednesday, June 23, 2010 12:11 AM > > > > > > To: clucene-developers@lists.sourceforge.net > > > > > > Subject: Re: [CLucene-dev] vector subscript out > > > > > > ofrangeexceptionduringindexing > > > > > > > > > > > > Those are the postings array and its staging area for > > > > > flushing. Once > > > > > > flushed, a Posting object can be deleted. > > > > > > > > > > > > The code you quoted is originally written in Java as: > > > > > > Arrays.fill(postingsFreeList, > > postingsFreeCount-numToFree, > > > > > > postingsFreeCount, null); > > > > > > > > > > > > Meaning, this is not a deletion but rather a nullification. > > > > > This may > > > > > > actually be a proper behavior for Java, since it maintains > > > > > > internal reference counting of all objects. However, > > it seem to > > > > > > have caused issues with JLucene as well for documents > > with many terms: > > > > > > https://issues.apache.org/jira/browse/LUCENE-1072. > > Only question > > > > > > is how come we haven't seen this until now, and whats special > > > > > with the reuters corpus? > > > > > > > > > > > > I think, if you could port TestDocuemntsWriter to cl_test (at > > > > > > least the relevant test case they have added) and check if it > > > > > crashes with > > > > > > the same characteristics of your issue, we could > > verify this is > > > > > > the same issue. Then we can apply their patch (while > > following > > > > > > the JIRA > > > > > > discussion) accordingly to DocumentsWriter.cpp. > > > > > > > > > > > > Itamar. > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > From: Kostka Bořivoj [mailto:kos...@tovek.cz] > > > > > > > Sent: Tuesday, June 22, 2010 11:53 PM > > > > > > > To: clucene-developers@lists.sourceforge.net > > > > > > > Subject: Re: [CLucene-dev] vector subscript out of > > > > > > > rangeexceptionduringindexing > > > > > > > > > > > > > > I did some research and found following: > > > > > > > > > > > > > > The problem is caused by freeing cycle in balanceRAM() > > > > > > > (documentswriter.cpp:1325) > > > > > > > > > > > > > > for ( size_t i = > > > > > > > this->postingsFreeCountDW-numToFree;i< > > > > > > > this->postingsFreeListDW.length; i++ ){ > > > > > > > _CLDELETE(this->postingsFreeListDW.values[i]); > > > > > > > } > > > > > > > > > > > > > > Because this->postingsFreeListDW.values contains pointers > > > > > which are > > > > > > > also used in postingsHash table, the _CLDELETE > > makes them invalid. > > > > > > > > > > > > > > So the main question is why Postings objects referenced in > > > > > > > postingsHash are also referenced by postingsFreeListDW. > > > > > > > > > > > > > > Until now I was not able to find the reason. > > > > > > > > > > > > > > Borek > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > From: Itamar Syn-Hershko [mailto:ita...@divrei-tora.com] > > > > > > > > Sent: Monday, June 21, 2010 2:08 PM > > > > > > > > To: clucene-developers@lists.sourceforge.net > > > > > > > > Subject: Re: [CLucene-dev] vector subscript out of range > > > > > > > > exceptionduringindexing > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > This seems to be the same error reported by Klemens Friedl > > > > > > > last week [1]. > > > > > > > > > > > > > > > > I can confirm your findings. After setting the demo > > > > > application to > > > > > > > > index the reuters corpora distributed with CLucene (see > > > > > my patch > > > > > > > > to master today), and setting maxFieldLength to > > MAX_INT, the > > > > > > > applications > > > > > > > > is failing on one of the files (for me it was > > > > > reut2-002.sgm). Call > > > > > > > > stack points to DocumentsWriterThreadState.cpp ln 1142, > > > > > > > > where > > > > > > > > threadState->p is pointing to freed or invalid memory. > > > > > > > > > > > > > > > > Unfortunately at the moment I cannot work on tracing this > > > > > > > properly. If > > > > > > > > you can do this yourself, I'll be happy to assist with > > > > > > > whatever I can. > > > > > > > > > > > > > > > > Itamar. > > > > > > > > > > > > > > > > [1] > > > > > > > > > http://comments.gmane.org/gmane.comp.jakarta.lucene.clucene.de > > > > > > vel/3449 . > > > > > > > Also see > > > > > > > > > > > > > > http://sourceforge.net/tracker/?func=detail&aid=2981449&group_id=8 > > > > > 00 > > > > > > > 13 > > > > > > > &atid= > > > > > > > 558446. > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > From: Kostka Bořivoj [mailto:kos...@tovek.cz] > > > > > > > > Sent: Monday, June 21, 2010 2:50 PM > > > > > > > > To: clucene-developers@lists.sourceforge.net > > > > > > > > Subject: [CLucene-dev] vector subscript out of range > > > > > > > > exception duringindexing > > > > > > > > > > > > > > > > During indexing set of documents (about 10000 already > > > > > > > > indexed) I get the exception "vector subscript > > out of range" > > > > > > > > from ArrayBase operator [ ]. > > > > > > > > I did some research and it seems it is because > > > > > > > > threadState->postingEquals() method is called with > > > > > invalid p set. > > > > > > > > The postingsHash[hashPos] probably contains pointer to > > > > > > > > already deleted object, as 0xfeee is in all members (I'm > > > > > running it under > > > > > > > > MSVC 2005 Debugger). > > > > > > > > See call stack and threadState->p dump below. > > > > > > > > > > > > > > > > Source (documentswriterthreadstate.cpp:1010) > > > > > > > > ====== > > > > > > > > > > > > > > > > // Locate Posting in hash > > > > > > > > threadState->p = postingsHash[hashPos]; > > > > > > > > > > > > > > > > if (threadState->p != NULL && > > > > > > > > !threadState->postingEquals(tokenText, > > tokenTextLen)) { ... > > > > > > > > > > > > > > > > > > > > > > > > Call stack > > > > > > > > ======== > > > > > > > > clucene-cored.dll!lucene::util::ArrayBase<wchar_t > > > > > > > > *>::operator[](unsigned int _Pos=0xfffffbbb) > > Line 92 C++ > > > > > > > > > > > > > > > > > > clucene-cored.dll!lucene::index::DocumentsWriter::ThreadState: > > > > > > > > :postingEquals(const wchar_t * tokenText=0x032772a8, const > > > > > > > > int tokenTextLen=0x00000008) Line 577 + 0x25 > > bytes C++ > > > > > > > > > > > > > > > > > > clucene-cored.dll!lucene::index::DocumentsWriter::ThreadState: > > > > > > > > :FieldData::addPosition(lucene::analysis::Token * > > > > > > > > token=0x0100c770) Line 1012 + 0x26 bytes C++ > > > > > > > > > > > > > > > > > > clucene-cored.dll!lucene::index::DocumentsWriter::ThreadState: > > > > > > > > :FieldData::invertField(lucene::document::Field * > > > > > > > > field=0x04d2a9e0, lucene::analysis::Analyzer * > > > > > > > > analyzer=0x010a5fa0, const int > > > > > > > > maxFieldLength=0x00002710) > > > > > > > > Line 902 C++ > > > > > > > > > > > > > > > > > > clucene-cored.dll!lucene::index::DocumentsWriter::ThreadState: > > > > > > > > :FieldData::processField(lucene::analysis::Analyzer * > > > > > > > > analyzer=0x010a5fa0) Line 797 C++ > > > > > > > > > > > > > > > > > > clucene-cored.dll!lucene::index::DocumentsWriter::ThreadState: > > > > > > > > :processDocument(lucene::analysis::Analyzer * > > > > > > > > analyzer=0x010a5fa0) Line 554 + 0x1a bytes C++ > > > > > > > > > > > > > > > > > > clucene-cored.dll!lucene::index::DocumentsWriter::updateDocu > > > > > > > > me nt(lucene::document::Document * doc=0x0012f600, > > > > > > > > lucene::analysis::Analyzer * analyzer=0x010a5fa0, > > > > > > > > lucene::index::Term * delTerm=0x00000000) Line 934 + 0xc > > > > > > > > bytes C++ > > > > > > > > > > > > > > > > > > clucene-cored.dll!lucene::index::DocumentsWriter::addDocumen > > > > > > > > t( lucene::document::Document * doc=0x0012f600, > > > > > > > > lucene::analysis::Analyzer * analyzer=0x010a5fa0) Line > > > > > 919 C++ > > > > > > > > > > > > > > > > > > clucene-cored.dll!lucene::index::IndexWriter::addDocument(lu > > > > > > > > ce ne::document::Document * doc=0x0012f600, > > > > > > > > lucene::analysis::Analyzer > > > > > > > > * analyzer=0x010a5fa0) Line 670 + > > > > > > > > 0x13 bytes C++ > > > > > > > > > > > > > > > > > > clucene-cored.dll!lucene::index::IndexModifier::addDocument( > > > > > > > > lu cene::document::Document * doc=0x0012f600, > > > > > > > > lucene::analysis::Analyzer * docAnalyzer=0x010a5fa0) Line > > > > > > > > 100 C++ > > > > > > > > > > > > > > > > > > mkidx.exe!tovek::index::Index::indexDocument(tovek::index::D > > > > > > > > oc ument & doc={...}, bool bInsert=false, unsigned long & > > > > > > > > ulPrevDoc=0x00000007, tovek::analysis::CachedAnalyzer * > > > > > > > > pCachedAnalyzer=0x010a5fa0) Line 472 C++ > > > > > > > > > > > > > > > > > > > > > > > > Problematic item in PostingHash: > > > > > > > > ========================= > > > > > > > > > > > > > > > > - threadState->p 0x02538fd8 > > > > > > > > {textStart=0xfeeefeee docFreq=0xfeeefeee > > freqStart=0xfeeefeee > > > > > > > > ...} lucene::index::DocumentsWriter::Posting * > > > > > > > > textStart 0xfeeefeee int > > > > > > > > docFreq 0xfeeefeee int > > > > > > > > freqStart 0xfeeefeee int > > > > > > > > freqUpto 0xfeeefeee int > > > > > > > > proxStart 0xfeeefeee int > > > > > > > > proxUpto 0xfeeefeee int > > > > > > > > lastDocID 0xfeeefeee int > > > > > > > > lastDocCode 0xfeeefeee int > > > > > > > > lastPosition 0xfeeefeee int > > > > > > > > + vector 0xfeeefeee {p=??? lastOffset=??? > > > > > > > > offsetStart=??? ...} > > > > > > lucene::index::DocumentsWriter::PostingVector * > > > > > > > > > > > > > > > > > > ------------------------------------------------------------ > > > > > > > > -- > > > > > > > > ---------------- > > > > > > > > ThinkGeek and WIRED's GeekDad team up for the Ultimate > > > > > > > > GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the > > > > > > > > lucky > > > > > parental unit. > > > > > > > > See the prize list and enter to win: > > > > > > > > http://p.sf.net/sfu/thinkgeek-promo > > > > > > > > _______________________________________________ > > > > > > > > CLucene-developers mailing list > > > > > > > > CLucene-developers@lists.sourceforge.net > > > > > > > > > > https://lists.sourceforge.net/lists/listinfo/clucene-develop > > > > > > > > ers > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ------------------------------------------------------------------ > > > > > -- > > > > > > > -- > > > > > > > -------- ThinkGeek and WIRED's GeekDad team up for the > > > > > > > Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE > > PRIZE to > > > > > > > the lucky parental unit. See the prize list and > > enter to win: > > > > > > > http://p.sf.net/sfu/thinkgeek-promo > > > > > > > _______________________________________________ > > > > > > > CLucene-developers mailing list > > > > > > > CLucene-developers@lists.sourceforge.net > > > > > > > > > https://lists.sourceforge.net/lists/listinfo/clucene-developer > > > > > > > s > > > > > > > > > > > > > > > > > > > ------------------------------------------------------------------ > > > > > ---- > > > > > > ------ > > > > > > -- > > > > > > ThinkGeek and WIRED's GeekDad team up for the > > Ultimate GeekDad > > > > > > Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky > > > > > parental unit. > > > > > > See the prize list and enter to win: > > > > > > http://p.sf.net/sfu/thinkgeek-promo > > > > > > _______________________________________________ > > > > > > CLucene-developers mailing list > > > > > > CLucene-developers@lists.sourceforge.net > > > > > > > > https://lists.sourceforge.net/lists/listinfo/clucene-developers > > > > > > > > > > > > > > > > > > > > > > > > > ------------------------------------------------------------------ > > > > > ---- > > > > > > -------- ThinkGeek and WIRED's GeekDad team up for > > the Ultimate > > > > > > GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the > > > > > lucky parental > > > > > > unit. See the prize list and enter to win: > > > > > > http://p.sf.net/sfu/thinkgeek-promo > > > > > > _______________________________________________ > > > > > > CLucene-developers mailing list > > > > > > CLucene-developers@lists.sourceforge.net > > > > > > > > https://lists.sourceforge.net/lists/listinfo/clucene-developers > > > > > > > > > > -------------------------------------------------------------- > > > > > ---------------- > > > > > ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad > > > > > Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental > > > > > unit. See the prize list and enter to win: > > > > > http://p.sf.net/sfu/thinkgeek-promo > > > > > _______________________________________________ > > > > > CLucene-developers mailing list > > > > > CLucene-developers@lists.sourceforge.net > > > > > https://lists.sourceforge.net/lists/listinfo/clucene-developers > > > > > > > > > > > > > > > > > > > -------------------------------------------------------------------- > > > > ---------- ThinkGeek and WIRED's GeekDad team up for the Ultimate > > > > GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky > > > > parental unit. See the prize list and enter to win: > > > > http://p.sf.net/sfu/thinkgeek-promo > > > > _______________________________________________ > > > > CLucene-developers mailing list > > > > CLucene-developers@lists.sourceforge.net > > > > https://lists.sourceforge.net/lists/listinfo/clucene-developers > > > > > > > > ---------------------------------------------------------------------- > > > -------- ThinkGeek and WIRED's GeekDad team up for the Ultimate > > > GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the > > lucky parental > > > unit. See the prize list and enter to win: > > > http://p.sf.net/sfu/thinkgeek-promo > > > _______________________________________________ > > > CLucene-developers mailing list > > > CLucene-developers@lists.sourceforge.net > > > https://lists.sourceforge.net/lists/listinfo/clucene-developers > > > > -------------------------------------------------------------- > > ---------------- > > ThinkGeek and WIRED's GeekDad team up for the Ultimate > > GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky > > parental unit. See the prize list and enter to win: > > http://p.sf.net/sfu/thinkgeek-promo > > _______________________________________________ > > CLucene-developers mailing list > > CLucene-developers@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/clucene-developers > > > > > ------------------------------------------------------------------------------ > ThinkGeek and WIRED's GeekDad team up for the Ultimate > GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the > lucky parental unit. See the prize list and enter to win: > http://p.sf.net/sfu/thinkgeek-promo > _______________________________________________ > CLucene-developers mailing list > CLucene-developers@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/clucene-developers ------------------------------------------------------------------------------ ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo _______________________________________________ CLucene-developers mailing list CLucene-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/clucene-developers