I believe there is a hard-coded limit in the Lucene code that ensures
that only the first 10000 terms are indexed.  I can't remember which
class that is in, but you can do a find.....|xargs grep 10000 under
Unix to find it.

Otis

--- Karl_�ie <[EMAIL PROTECTED]> wrote:
> I have created a testclass for working with Analyzers and ran into a
> strange
> problem; I cannot search for text in fields with more than 10000
> words!?!?
> 
> I have tested for various bugs in my test class, but I cannot find
> anything
> there (please have a look, files are attached).
> 
> 
> the class "AnalayzerTest" can be used like this:
> 
> "java -cp lucene-1.2-rc3-dev.jar
> org.apache.lucene.analysis.AnalyzerTest
> voc.txt voc_out.txt"
> 
> where the "voc.txt" and "voc_out.txt" also are included in the zip
> file.
> 
> 
> The approach is simple: voc.txt contains 20628 Norwegian words, to
> test the
> Analyzer I try to do this:
> 
> - create a string containing all the 20628 words separated with " ".
> - create a lucene document and index this string as a text field.
> - add this one document to an index
> - loop trough the words again and query the index for each of the
> same words
> in the list.
> - if everything works every word should yield a hit in the single
> document
> that exist in the index.
> 
> 
> To be sure nothing is filtered I have used the WhitespaceAnalyzer
> analyzer
> (or NullAnalyzer...).
> 
> 
> But here comes the problems:
> ----------------------------
> 
> If I try to run all the 20628 words, the last 10628 words can not be
> found
> by the IndexSearcher. If I flip the words around(reverse
> alpha-order). I
> cannot find the 10628 first words!!.
> 
> If I limit the wordlist to 10000, I get a perfect match for either
> the first
> or last 10000 words. If I set the limit to 10005 I will get 5 words
> not
> found at the beginning or end of the list according to order.
> 
> 
> Does anyone know what's going on here?? I would be very happy if
> someone
> could point to a place in my code where I have done something really
> stupid,
> because I have tried to track this for a hole day.
> 
> 
> mvh karl �ie/gan media
> 

> ATTACHMENT part 2 application/x-zip-compressed name=AnalyzerTest.zip
> --
> To unsubscribe, e-mail:  
> <mailto:[EMAIL PROTECTED]>
> For additional commands, e-mail:
<mailto:[EMAIL PROTECTED]>


__________________________________________________
Do You Yahoo!?
Great stuff seeking new owners in Yahoo! Auctions! 
http://auctions.yahoo.com

--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Reply via email to