I believe there is a hard-coded limit in the Lucene code that ensures that only the first 10000 terms are indexed. I can't remember which class that is in, but you can do a find.....|xargs grep 10000 under Unix to find it.
Otis --- Karl_�ie <[EMAIL PROTECTED]> wrote: > I have created a testclass for working with Analyzers and ran into a > strange > problem; I cannot search for text in fields with more than 10000 > words!?!? > > I have tested for various bugs in my test class, but I cannot find > anything > there (please have a look, files are attached). > > > the class "AnalayzerTest" can be used like this: > > "java -cp lucene-1.2-rc3-dev.jar > org.apache.lucene.analysis.AnalyzerTest > voc.txt voc_out.txt" > > where the "voc.txt" and "voc_out.txt" also are included in the zip > file. > > > The approach is simple: voc.txt contains 20628 Norwegian words, to > test the > Analyzer I try to do this: > > - create a string containing all the 20628 words separated with " ". > - create a lucene document and index this string as a text field. > - add this one document to an index > - loop trough the words again and query the index for each of the > same words > in the list. > - if everything works every word should yield a hit in the single > document > that exist in the index. > > > To be sure nothing is filtered I have used the WhitespaceAnalyzer > analyzer > (or NullAnalyzer...). > > > But here comes the problems: > ---------------------------- > > If I try to run all the 20628 words, the last 10628 words can not be > found > by the IndexSearcher. If I flip the words around(reverse > alpha-order). I > cannot find the 10628 first words!!. > > If I limit the wordlist to 10000, I get a perfect match for either > the first > or last 10000 words. If I set the limit to 10005 I will get 5 words > not > found at the beginning or end of the list according to order. > > > Does anyone know what's going on here?? I would be very happy if > someone > could point to a place in my code where I have done something really > stupid, > because I have tried to track this for a hole day. > > > mvh karl �ie/gan media > > ATTACHMENT part 2 application/x-zip-compressed name=AnalyzerTest.zip > -- > To unsubscribe, e-mail: > <mailto:[EMAIL PROTECTED]> > For additional commands, e-mail: <mailto:[EMAIL PROTECTED]> __________________________________________________ Do You Yahoo!? Great stuff seeking new owners in Yahoo! Auctions! http://auctions.yahoo.com -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
