Also, see the thread on this list titled "I just don't get wildcards at all" to see an extensive discussion of this issue, as well as wildcards in general. You might also search the archive for wildcards. The short form is that any wildcard (including prefix queries) expands under the covers to create a clause for each possible entry in the index for that field. For instance, say a field had the following values:
abcd abck abt Searching for ab* would expand to searching for ab, abck and abt under the covers. When the number of possibilities gets above the default value of 1024, you see a TooManyClauses exception. Expanding the number of clauses *may* fix you right up, but on any reasonably sized index, you can come up with a query that'll exceed whatever number you set. Or you'll get to an unacceptable performance/memory footprint. Imagine your query with things like a* Think seriously about how you're going to deal with this. There are several options: 1> use filters for all your wildcard clauses and create your own BooleanQuery. Be aware that using filters affects scoring. 2> Assume that any query that throws a TooManyClauses exception (after you've set a suitable max as Paul suggested) is too broad to be useful and respond to the user with some polite phrase asking them to refine the query. 3> Look over the SrndQuery classes. I don't fully understand these, but they certainly behave much differently in this area. Note that SrndQuery limits wildcards to having at least three non-wildcard characters. 4> Ask whether stemming is a complete or partial solution. Ditto for Soundex. There's a good chance these won't apply, but they may. 5> <Insert the solution to your specific problem here> This is a sticky wicket that will probably consume more time than you think to handle. It's easy for your product manager to claim that "Of course, we must support arbitrary wildcards", but I'd urge you to seriously ask what value *arbitrary* wildcards bring to the product. When you start getting thousands of responses to a query, is it actually valuable to return them to the user? Or do you give her just as much value (and deliver product sooner) by telling her up front that she's getting too many responses to be useful? With this last strategy, you just catch the TooManyClauses exception and respond with "refine your query"..... Best Erick On 12/27/06, Paul Elschot <[EMAIL PROTECTED]> wrote:
Chris, On Wednesday 27 December 2006 15:42, Chris Salem wrote: > Hi All, > > I'm getting a 'TooManyClauses' Exception and I'm not sure how to fix this. Here's a sample query that I'm using: > > +(+freeform_text:exhibit* +(+freeform_text:dispaly +freeform_text:event*) +(+freeform_text:sale* +freeform_text:sells +freeform_text:develop*) +(+freeform_text:trade +freeform_text:show +freeform_text:trade +freeform_text:shows)) +degree_type:5 +position_desired:ftp +city:washington~0.5 +state:dc +ncountry:usa +last_modified:[2005-12-26 TO 2006-12-26] > > Here's the exception I'm getting: > > org.apache.lucene.search.BooleanQuery$TooManyClauses > at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:160) > at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:151) > at org.apache.lucene.search.PrefixQuery.rewrite(PrefixQuery.java:52) One of the prefix queries is causing this, possibly event* or sale*. Since they seem to be specific enough, increasing the maximum number of boolean clauses that can be added to a boolean query appears to be the good way to fix this, see BooleanQuery.setMaxClauseCount(). Regards, Paul Elschot > at org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:372) > at org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:372) > at org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:372) > at org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java :137) > at org.apache.lucene.search.Query.weight(Query.java:93) > at org.apache.lucene.search.Hits.<init>(Hits.java:41) > at org.apache.lucene.search.Searcher.search(Searcher.java:44) > at org.apache.lucene.search.Searcher.search(Searcher.java:36) > at net.mainsequence.pcr.lucene.LuceneHandler.multiSearch(LuceneHandler.java :382) > at net.mainsequence.pcr.lucene.LuceneServlet.searchIndex(LuceneServlet.java :169) > at net.mainsequence.pcr.lucene.LuceneServlet.processRequest( LuceneServlet.java:83) > at net.mainsequence.pcr.lucene.LuceneServlet.doPost(LuceneServlet.java :72) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:709) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:802) > at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter( ApplicationFilterChain.java:252) > at org.apache.catalina.core.ApplicationFilterChain.doFilter( ApplicationFilterChain.java:173) > at org.apache.catalina.core.StandardWrapperValve.invoke( StandardWrapperValve.java:213) > at org.apache.catalina.core.StandardContextValve.invoke( StandardContextValve.java:178) > at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java :126) > at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java :105) > at org.apache.catalina.core.StandardEngineValve.invoke( StandardEngineValve.java:107) > at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java :148) > at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:869) > at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection (Http11BaseProtocol.java:664) > at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket( PoolTcpEndpoint.java:527) > at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt( LeaderFollowerWorkerThread.java:80) > at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run( ThreadPool.java:684) > at java.lang.Thread.run(Unknown Source) > > Is there anyway to increase the amount of clauses lucene can take? This kind of large query is not uncommon so any help would be greatly appreciated. > > > Chris Salem > 440.946.5214 x5458 > [EMAIL PROTECTED] > > (The following links were included with this email:) > mailto:[EMAIL PROTECTED] > > > > (The following links were included with this email:) > mailto:[EMAIL PROTECTED] > > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]