To follow up on my post from Thursday. I have written a very basic test
for TermPositions. This test allows me to identify that only the
first 10001 tokens are considered to determine term frequency (ie with
the searching term in a position greater than 10001 my test fails).
Is this by design? Is there an obvious work-around so that the frequency
that I receive is correct for my document?
Thank you for your consideration,
Tricia
On Thu, 29 Sep 2005, Tricia Williams wrote:
> I am finding that TermDocs.freq() method is returning an incorrect value.
> I was wondering if anyone else had experienced this problem.
>
> I am using tp = IndexReader.termPositions( queryTerm ) to return a object
> which implements TermPositions. I then use tp.skipTo( docid ) to go
> directly to the document from which I wish to retrieve term positions. The
> following for loop adds the positions to my ArrayList which I use later:
>
> for( int pos = tp.nextPosition(), k = 0;
> k < tp.freq();
> pos = tp.nextPosition(), k++ )
> {
> positionMatches.add( new Integer( pos ) );
> }
>
> In a document which I know has 48 references to the term, a frequency of
> 23 is returned. There doesn't seem to be a pattern to this as some other
> documents have (frequency, actual): (25, 48), (36, 43), (30, 149).
>
> These frequencies are from results within my code and confirmed in Luke,
> so I'm pretty certain that this isn't an error on my part.
>
> I've been trying to find out where the origin of this issue is without
> luck thus far. Any help or advice would be appreciated.
>
> Thanks,
> Tricia
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]