What I meant with subclassing is, create a derived TestingIndexWriter in
cl_test and take those test methods out of the core into it (as public).

I said I like the above approach better than making them public in the core,
since they seem to only be used for testing hence no need for them to exist
in the core. But if they are being referenced from anywhere else where it
requires them to be in the core, then there they should be, and the only
place to check for that is the java sources.

I'll have a look at your other mail asap.

Itamar.

> -----Original Message-----
> From: Kostka Bořivoj [mailto:kos...@tovek.cz] 
> Sent: Thursday, June 24, 2010 10:54 AM
> To: clucene-developers@lists.sourceforge.net
> Subject: Re: [CLucene-dev] vector subscript out of 
> rangeexceptionduringindexing
> 
> I don't understand how subclassing can help, as the member in 
> base class is private, so it isn't accesible even for children.
> 
> I'm not a friend of Friend classes (it seems to me an uggly 
> technique which breaks encapsulation) and it also needs 
> changes to DocumentWriter.
> 
> So the only way I see is to change method to be public. I'm 
> not very happy doing so, but I cannot see any other way...
> 
> Borek
> 
> 
> > -----Original Message-----
> > From: Itamar Syn-Hershko [mailto:ita...@code972.com]
> > Sent: Thursday, June 24, 2010 12:11 AM
> > To: clucene-developers@lists.sourceforge.net
> > Subject: Re: [CLucene-dev] vector subscript out of range 
> > exceptionduringindexing
> > 
> > In IndexWriter.h (line 1163) there are a few functions 
> marked as being 
> > for test purposes only. From what I could tell, they are not being 
> > accessed from anywhere right now.
> > 
> > Your options as I see them are:
> > 
> > * Make them public (I'm not sure how Java gets around that 
> one without 
> > doing
> > this)
> > * Subclass IndexWriter in the test suite and make them 
> available only 
> > under it
> > * "Friend" the classes
> > 
> > Decide which to do based on the way JL uses them 
> (apparently we aren't 
> > using them at all at the moment, so don't look at CL for 
> this). If it 
> > is possible to make this code available from within the test suite 
> > alone, I'd definitely preffer to compile those out of the core's 
> > IndexWriter. "Friend"ing is probably not possible to do without 
> > putting test code in CL, which as I said
> > - the core is better left without.
> > 
> > HTH
> > 
> > Itamar.
> > 
> > > -----Original Message-----
> > > From: Kostka Bořivoj [mailto:kos...@tovek.cz]
> > > Sent: Thursday, June 24, 2010 12:22 AM
> > > To: clucene-developers@lists.sourceforge.net
> > > Subject: Re: [CLucene-dev] vector subscript out of range 
> exception 
> > > duringindexing
> > >
> > > I started porting of test but I have problem with 
> private/protected 
> > > methods. Some JLucene methods are used in tests but 
> marked private 
> > > in CLucene, e.g.
> > >
> > >     IndexWriter writer = new IndexWriter(dir, analyzer, true);
> > >     writer.addDocument(testDoc);
> > >     writer.flush();
> > >     SegmentInfo info = writer.newestSegment();
> > >
> > > Can be easily ported to
> > >
> > >     IndexWriter * writer = _CLNEW IndexWriter(dir, 
> analyzer, true);
> > >     writer->addDocument(&testDoc);
> > >     writer->flush();
> > >     SegmentInfo * info = writer->newestSegment();
> > >
> > > But the newestSegment method is private, so test cannot 
> be compiled.
> > >
> > > Any hint how to go around that?
> > >
> > > Borek
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Kostka Bořivoj [mailto:kos...@tovek.cz]
> > > > Sent: Wednesday, June 23, 2010 5:00 PM
> > > > To: clucene-developers@lists.sourceforge.net
> > > > Subject: Re: [CLucene-dev] vector subscript out of 
> > > > rangeexceptionduringindexing
> > > >
> > > > I'll try to port whole TestDocumentsWriter, it is not so big
> > > >
> > > > > -----Original Message-----
> > > > > From: Itamar Syn-Hershko [mailto:ita...@code972.com]
> > > > > Sent: Wednesday, June 23, 2010 12:39 PM
> > > > > To: clucene-developers@lists.sourceforge.net
> > > > > Subject: Re: [CLucene-dev] vector subscript out of range 
> > > > > exceptionduringindexing
> > > > >
> > > > > Use Java Lucene 2.3.2, which the git master branch is
> > > based on. Grab
> > > > > it from http://archive.apache.org/dist/lucene/java/, or
> > > you can use
> > > > > tools like Krugle to read the code on-line.
> > > > >
> > > > > You may only need this to port TestDocumentsWriter as 
> a whole. 
> > > > > To fix this specific issue I think it is enough to follow the 
> > > > > patch attached to the JIRA issue. I'm not sure it was deployed
> > > to the 2.3.2 sources, btw.
> > > > >
> > > > > Itamar.
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Kostka Bořivoj [mailto:kos...@tovek.cz]
> > > > > > Sent: Wednesday, June 23, 2010 12:10 PM
> > > > > > To: clucene-developers@lists.sourceforge.net
> > > > > > Subject: Re: [CLucene-dev] vector subscript out of
> > > range exception
> > > > > > duringindexing
> > > > > >
> > > > > > I'm not sure which JLucene version I should use (and
> > > where to get
> > > > > > it)
> > > > > >
> > > > > > Borek
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Itamar Syn-Hershko [mailto:ita...@code972.com]
> > > > > > > Sent: Wednesday, June 23, 2010 12:11 AM
> > > > > > > To: clucene-developers@lists.sourceforge.net
> > > > > > > Subject: Re: [CLucene-dev] vector subscript out 
> > > > > > > ofrangeexceptionduringindexing
> > > > > > >
> > > > > > > Those are the postings array and its staging area for
> > > > > > flushing. Once
> > > > > > > flushed, a Posting object can be deleted.
> > > > > > >
> > > > > > > The code you quoted is originally written in Java as:
> > > > > > >   Arrays.fill(postingsFreeList,
> > > postingsFreeCount-numToFree,
> > > > > > > postingsFreeCount, null);
> > > > > > >
> > > > > > > Meaning, this is not a deletion but rather a 
> nullification.
> > > > > > This may
> > > > > > > actually be a proper behavior for Java, since it 
> maintains 
> > > > > > > internal reference counting of all objects. However,
> > > it seem to
> > > > > > > have caused issues with JLucene as well for documents
> > > with many terms:
> > > > > > > https://issues.apache.org/jira/browse/LUCENE-1072.
> > > Only question
> > > > > > > is how come we haven't seen this until now, and whats 
> > > > > > > special
> > > > > > with the reuters corpus?
> > > > > > >
> > > > > > > I think, if you could port TestDocuemntsWriter to cl_test 
> > > > > > > (at least the relevant test case they have added) 
> and check 
> > > > > > > if it
> > > > > > crashes with
> > > > > > > the same characteristics of your issue, we could
> > > verify this is
> > > > > > > the same issue. Then we can apply their patch (while
> > > following
> > > > > > > the JIRA
> > > > > > > discussion) accordingly to DocumentsWriter.cpp.
> > > > > > >
> > > > > > > Itamar.
> > > > > > >
> > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: Kostka Bořivoj [mailto:kos...@tovek.cz]
> > > > > > > > Sent: Tuesday, June 22, 2010 11:53 PM
> > > > > > > > To: clucene-developers@lists.sourceforge.net
> > > > > > > > Subject: Re: [CLucene-dev] vector subscript out of 
> > > > > > > > rangeexceptionduringindexing
> > > > > > > >
> > > > > > > > I did some research and found following:
> > > > > > > >
> > > > > > > > The problem is caused by freeing cycle in balanceRAM()
> > > > > > > > (documentswriter.cpp:1325)
> > > > > > > >
> > > > > > > >         for ( size_t i =
> > > > > > > > this->postingsFreeCountDW-numToFree;i<
> > > > > > > > this->postingsFreeListDW.length; i++ ){
> > > > > > > >           _CLDELETE(this->postingsFreeListDW.values[i]);
> > > > > > > >         }
> > > > > > > >
> > > > > > > > Because this->postingsFreeListDW.values 
> contains pointers
> > > > > > which are
> > > > > > > > also used in postingsHash table, the _CLDELETE
> > > makes them invalid.
> > > > > > > >
> > > > > > > > So the main question is why Postings objects 
> referenced in 
> > > > > > > > postingsHash are also referenced by postingsFreeListDW.
> > > > > > > >
> > > > > > > > Until now I was not able to find the reason.
> > > > > > > >
> > > > > > > > Borek
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > -----Original Message-----
> > > > > > > > > From: Itamar Syn-Hershko 
> [mailto:ita...@divrei-tora.com]
> > > > > > > > > Sent: Monday, June 21, 2010 2:08 PM
> > > > > > > > > To: clucene-developers@lists.sourceforge.net
> > > > > > > > > Subject: Re: [CLucene-dev] vector subscript 
> out of range 
> > > > > > > > > exceptionduringindexing
> > > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > This seems to be the same error reported by Klemens 
> > > > > > > > > Friedl
> > > > > > > > last week [1].
> > > > > > > > >
> > > > > > > > > I can confirm your findings. After setting the demo
> > > > > > application to
> > > > > > > > > index the reuters corpora distributed with 
> CLucene (see
> > > > > > my patch
> > > > > > > > > to master today), and setting maxFieldLength to
> > > MAX_INT, the
> > > > > > > > applications
> > > > > > > > > is failing on one of the files (for me it was
> > > > > > reut2-002.sgm). Call
> > > > > > > > > stack points to 
> DocumentsWriterThreadState.cpp ln 1142, 
> > > > > > > > > where
> > > > > > > > > threadState->p is pointing to freed or invalid memory.
> > > > > > > > >
> > > > > > > > > Unfortunately at the moment I cannot work on tracing 
> > > > > > > > > this
> > > > > > > > properly. If
> > > > > > > > > you can do this yourself, I'll be happy to assist with
> > > > > > > > whatever I can.
> > > > > > > > >
> > > > > > > > > Itamar.
> > > > > > > > >
> > > > > > > > > [1]
> > > > > > > >
> > > http://comments.gmane.org/gmane.comp.jakarta.lucene.clucene.de
> > > > > > > vel/3449 .
> > > > > > > > Also see
> > > > > > > >
> > > > > >
> > > http://sourceforge.net/tracker/?func=detail&aid=2981449&group_id=8
> > > > > > 00
> > > > > > > > 13
> > > > > > > > &atid=
> > > > > > > > 558446.
> > > > > > > >
> > > > > > > > > -----Original Message-----
> > > > > > > > > From: Kostka Bořivoj [mailto:kos...@tovek.cz]
> > > > > > > > > Sent: Monday, June 21, 2010 2:50 PM
> > > > > > > > > To: clucene-developers@lists.sourceforge.net
> > > > > > > > > Subject: [CLucene-dev] vector subscript out of range 
> > > > > > > > > exception duringindexing
> > > > > > > > >
> > > > > > > > > During indexing set of documents (about 10000 already
> > > > > > > > > indexed) I get the exception "vector subscript
> > > out of range"
> > > > > > > > > from ArrayBase operator [ ].
> > > > > > > > > I did some research and it seems it is because
> > > > > > > > > threadState->postingEquals() method is called with
> > > > > > invalid p set.
> > > > > > > > > The postingsHash[hashPos] probably contains 
> pointer to 
> > > > > > > > > already deleted object, as 0xfeee is in all 
> members (I'm
> > > > > > running it under
> > > > > > > > > MSVC 2005 Debugger).
> > > > > > > > > See call stack and threadState->p dump below.
> > > > > > > > >
> > > > > > > > > Source (documentswriterthreadstate.cpp:1010)
> > > > > > > > > ======
> > > > > > > > >
> > > > > > > > >   // Locate Posting in hash
> > > > > > > > >   threadState->p = postingsHash[hashPos];
> > > > > > > > >
> > > > > > > > >   if (threadState->p != NULL && 
> > > > > > > > > !threadState->postingEquals(tokenText,
> > > tokenTextLen)) { ...
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Call stack
> > > > > > > > > ========
> > > > > > > > > clucene-cored.dll!lucene::util::ArrayBase<wchar_t
> > > > > > > > > *>::operator[](unsigned int _Pos=0xfffffbbb)
> > > Line 92   C++
> > > > > > > > >
> > > > > > > > >
> > > clucene-cored.dll!lucene::index::DocumentsWriter::ThreadState:
> > > > > > > > > :postingEquals(const wchar_t * tokenText=0x032772a8, 
> > > > > > > > > const int tokenTextLen=0x00000008)  Line 577 + 0x25
> > > bytes     C++
> > > > > > > > >
> > > > > > > > >
> > > clucene-cored.dll!lucene::index::DocumentsWriter::ThreadState:
> > > > > > > > > :FieldData::addPosition(lucene::analysis::Token *
> > > > > > > > > token=0x0100c770)  Line 1012 + 0x26 bytes     C++
> > > > > > > > >
> > > > > > > > >
> > > clucene-cored.dll!lucene::index::DocumentsWriter::ThreadState:
> > > > > > > > > :FieldData::invertField(lucene::document::Field * 
> > > > > > > > > field=0x04d2a9e0, lucene::analysis::Analyzer * 
> > > > > > > > > analyzer=0x010a5fa0, const int
> > > > > > > > > maxFieldLength=0x00002710)
> > > > > > > > > Line 902      C++
> > > > > > > > >
> > > > > > > > >
> > > clucene-cored.dll!lucene::index::DocumentsWriter::ThreadState:
> > > > > > > > > :FieldData::processField(lucene::analysis::Analyzer *
> > > > > > > > > analyzer=0x010a5fa0)  Line 797        C++
> > > > > > > > >
> > > > > > > > >
> > > clucene-cored.dll!lucene::index::DocumentsWriter::ThreadState:
> > > > > > > > > :processDocument(lucene::analysis::Analyzer *
> > > > > > > > > analyzer=0x010a5fa0)  Line 554 + 0x1a bytes   C++
> > > > > > > > >
> > > > > > > > >
> > > clucene-cored.dll!lucene::index::DocumentsWriter::updateDocu
> > > > > > > > > me nt(lucene::document::Document * doc=0x0012f600, 
> > > > > > > > > lucene::analysis::Analyzer * analyzer=0x010a5fa0, 
> > > > > > > > > lucene::index::Term * delTerm=0x00000000)  
> Line 934 + 0xc
> > > > > > > > > bytes C++
> > > > > > > > >
> > > > > > > > >
> > > clucene-cored.dll!lucene::index::DocumentsWriter::addDocumen
> > > > > > > > > t( lucene::document::Document * doc=0x0012f600, 
> > > > > > > > > lucene::analysis::Analyzer * 
> analyzer=0x010a5fa0)  Line
> > > > > > 919 C++
> > > > > > > > >
> > > > > > > > >
> > > clucene-cored.dll!lucene::index::IndexWriter::addDocument(lu
> > > > > > > > > ce ne::document::Document * doc=0x0012f600, 
> > > > > > > > > lucene::analysis::Analyzer
> > > > > > > > > * analyzer=0x010a5fa0)  Line 670 +
> > > > > > > > > 0x13 bytes    C++
> > > > > > > > >
> > > > > > > > >
> > > clucene-cored.dll!lucene::index::IndexModifier::addDocument(
> > > > > > > > > lu cene::document::Document * doc=0x0012f600, 
> > > > > > > > > lucene::analysis::Analyzer * 
> docAnalyzer=0x010a5fa0)  Line
> > > > > > > > > 100   C++
> > > > > > > > >
> > > > > > > > >
> > > mkidx.exe!tovek::index::Index::indexDocument(tovek::index::D
> > > > > > > > > oc ument & doc={...}, bool bInsert=false, 
> unsigned long 
> > > > > > > > > & ulPrevDoc=0x00000007, 
> tovek::analysis::CachedAnalyzer *
> > > > > > > > > pCachedAnalyzer=0x010a5fa0)  Line 472 C++
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Problematic item in PostingHash:
> > > > > > > > > =========================
> > > > > > > > >
> > > > > > > > > -             threadState->p  0x02538fd8
> > > > > > > > > {textStart=0xfeeefeee docFreq=0xfeeefeee
> > > freqStart=0xfeeefeee
> > > > > > > > > ...}  lucene::index::DocumentsWriter::Posting *
> > > > > > > > >               textStart       0xfeeefeee      int
> > > > > > > > >               docFreq 0xfeeefeee      int
> > > > > > > > >               freqStart       0xfeeefeee      int
> > > > > > > > >               freqUpto        0xfeeefeee      int
> > > > > > > > >               proxStart       0xfeeefeee      int
> > > > > > > > >               proxUpto        0xfeeefeee      int
> > > > > > > > >               lastDocID       0xfeeefeee      int
> > > > > > > > >               lastDocCode     0xfeeefeee      int
> > > > > > > > >               lastPosition    0xfeeefeee      int
> > > > > > > > > +             vector  0xfeeefeee {p=??? lastOffset=???
> > > > > > > > > offsetStart=??? ...}
> > > > > > > lucene::index::DocumentsWriter::PostingVector *
> > > > > > > > >
> > > > > > > > >
> > > ------------------------------------------------------------
> > > > > > > > > --
> > > > > > > > > ----------------
> > > > > > > > > ThinkGeek and WIRED's GeekDad team up for the 
> Ultimate 
> > > > > > > > > GeekDad Father's Day Giveaway. ONE MASSIVE 
> PRIZE to the 
> > > > > > > > > lucky
> > > > > > parental unit.
> > > > > > > > > See the prize list and enter to win:
> > > > > > > > > http://p.sf.net/sfu/thinkgeek-promo
> > > > > > > > > _______________________________________________
> > > > > > > > > CLucene-developers mailing list 
> > > > > > > > > CLucene-developers@lists.sourceforge.net
> > > > > > > > >
> > > https://lists.sourceforge.net/lists/listinfo/clucene-develop
> > > > > > > > > ers
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > >
> > > ------------------------------------------------------------------
> > > > > > --
> > > > > > > > --
> > > > > > > > -------- ThinkGeek and WIRED's GeekDad team up for the 
> > > > > > > > Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE
> > > PRIZE to
> > > > > > > > the lucky parental unit.  See the prize list and
> > > enter to win:
> > > > > > > > http://p.sf.net/sfu/thinkgeek-promo
> > > > > > > > _______________________________________________
> > > > > > > > CLucene-developers mailing list 
> > > > > > > > CLucene-developers@lists.sourceforge.net
> > > > > > > >
> > > https://lists.sourceforge.net/lists/listinfo/clucene-developer
> > > > > > > > s
> > > > > > >
> > > > > > >
> > > > > >
> > > ------------------------------------------------------------------
> > > > > > ----
> > > > > > > ------
> > > > > > > --
> > > > > > > ThinkGeek and WIRED's GeekDad team up for the
> > > Ultimate GeekDad
> > > > > > > Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky
> > > > > > parental unit.
> > > > > > > See the prize list and enter to win:
> > > > > > > http://p.sf.net/sfu/thinkgeek-promo
> > > > > > > _______________________________________________
> > > > > > > CLucene-developers mailing list 
> > > > > > > CLucene-developers@lists.sourceforge.net
> > > > > > >
> > > https://lists.sourceforge.net/lists/listinfo/clucene-developers
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > ------------------------------------------------------------------
> > > > > > ----
> > > > > > > -------- ThinkGeek and WIRED's GeekDad team up for
> > > the Ultimate
> > > > > > > GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
> > > > > > lucky parental
> > > > > > > unit.  See the prize list and enter to win:
> > > > > > > http://p.sf.net/sfu/thinkgeek-promo
> > > > > > > _______________________________________________
> > > > > > > CLucene-developers mailing list 
> > > > > > > CLucene-developers@lists.sourceforge.net
> > > > > > >
> > > https://lists.sourceforge.net/lists/listinfo/clucene-developers
> > > > > >
> > > > > > 
> --------------------------------------------------------------
> > > > > > ----------------
> > > > > > ThinkGeek and WIRED's GeekDad team up for the 
> Ultimate GeekDad 
> > > > > > Father's Day Giveaway. ONE MASSIVE PRIZE to the 
> lucky parental 
> > > > > > unit.  See the prize list and enter to win:
> > > > > > http://p.sf.net/sfu/thinkgeek-promo
> > > > > > _______________________________________________
> > > > > > CLucene-developers mailing list 
> > > > > > CLucene-developers@lists.sourceforge.net
> > > > > > 
> https://lists.sourceforge.net/lists/listinfo/clucene-developer
> > > > > > s
> > > > > >
> > > > >
> > > > >
> > > > >
> > > 
> --------------------------------------------------------------------
> > > > > ---------- ThinkGeek and WIRED's GeekDad team up for the 
> > > > > Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE 
> PRIZE to the 
> > > > > lucky parental unit.  See the prize list and enter to win:
> > > > > http://p.sf.net/sfu/thinkgeek-promo
> > > > > _______________________________________________
> > > > > CLucene-developers mailing list
> > > > > CLucene-developers@lists.sourceforge.net
> > > > > 
> https://lists.sourceforge.net/lists/listinfo/clucene-developers
> > > >
> > > >
> > > 
> --------------------------------------------------------------------
> > > --
> > > > -------- ThinkGeek and WIRED's GeekDad team up for the Ultimate 
> > > > GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
> > > lucky parental
> > > > unit.  See the prize list and enter to win:
> > > > http://p.sf.net/sfu/thinkgeek-promo
> > > > _______________________________________________
> > > > CLucene-developers mailing list
> > > > CLucene-developers@lists.sourceforge.net
> > > > https://lists.sourceforge.net/lists/listinfo/clucene-developers
> > >
> > > --------------------------------------------------------------
> > > ----------------
> > > ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad 
> > > Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky 
> parental unit.  
> > > See the prize list and enter to win:
> > > http://p.sf.net/sfu/thinkgeek-promo
> > > _______________________________________________
> > > CLucene-developers mailing list
> > > CLucene-developers@lists.sourceforge.net
> > > https://lists.sourceforge.net/lists/listinfo/clucene-developers
> > >
> > 
> > 
> > 
> ----------------------------------------------------------------------
> > -------- ThinkGeek and WIRED's GeekDad team up for the Ultimate 
> > GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
> lucky parental 
> > unit.  See the prize list and enter to win:
> > http://p.sf.net/sfu/thinkgeek-promo
> > _______________________________________________
> > CLucene-developers mailing list
> > CLucene-developers@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/clucene-developers
> 
> --------------------------------------------------------------
> ----------------
> ThinkGeek and WIRED's GeekDad team up for the Ultimate 
> GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky 
> parental unit.  See the prize list and enter to win: 
> http://p.sf.net/sfu/thinkgeek-promo
> _______________________________________________
> CLucene-developers mailing list
> CLucene-developers@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/clucene-developers
> 


------------------------------------------------------------------------------
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo
_______________________________________________
CLucene-developers mailing list
CLucene-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/clucene-developers

Reply via email to