On Jul 7, 2005, at 1:12 PM, MariLuz Elola wrote:

Hi Erik, excuse me for all my questions. Thank you very much for your speedy answers, and sorry for my bad english.
I am spanish and I don“t speak english very well.
Well, I have one question more.
Finally I am using IndexReader to return all the documents:
Directory directory = FSDirectory.getDirectory(path, false);
               IndexReader reader = IndexReader.open(directory);
       for (int start = base; start < end; start++) {
           Document doc = reader.document(start);
String id=doc.get (es.seinet.xtent.searchEngine.lucene.general.Util.ID);
           ides.add(id);
       }
It works fine and speedy. The only problem is that it is impossible to sort the results by some metadata (gets all the documents order by title, for example).

If you truly need to have a Query that can find all documents, then add a special field to each document with a fixed value such as doc:yes and then do a TermQuery for doc:yes. You could then leverage Lucene's sorting capability.

My question is about the parameter maxClauseCount. I think the same that you. It is not a good idea bump up the limit... If I use the default vale (1024) and I search, I am getting this error: [SearchCollection,executeQuery] caught a class org.apache.lucene.search.BooleanQuery$TooManyClauses
with message: null

Are there any way to search all the documents (210.000 documents) and internally works only with 1024, returns documents until 1024 and not get the toomanyclauses error??? I need to work efficiently with collections of more than 250.000 regitries, and the users normally does complex querys (ej: DATE:[20050601 to 20050701] AND TITLE:Lucene* ...... ect....)

The issue is that PrefixQuery, WildcardQuery, RangeQuery, and FuzzyQuery all expand to the terms that match in a BooleanQuery OR fashion. You need to identify what terms those are and address them individually. I can't offer specific advice since I don't know what fields you're using and what values they may contain. But one example is with dates. If you index dates and do it at the millisecond granularity but you really only need to query by YEAR then there is a great chance one of those query types will expand to TooManyClauses. If, instead, you indexed dates by YYYY when all you need is year granularity then you have far fewer terms. I hope this makes sense and helps.

    Erik

Reply via email to