I think I have looked at constant score queries however, the relevance is of value to the users so we left it as is. :(
Erick's idea of stripping terms with wildcards that has less then an acceptable number of characters is a good idea and I might try it once I get the time. Thanks, M ________________________________ From: Mark Miller <markrmil...@gmail.com> To: java-user@lucene.apache.org Sent: Thursday, April 2, 2009 3:52:24 PM Subject: Re: Search using MultiSearcher generates OOM on a 1GB total Partitioned indeces You might try a constant score wildcard query (similar to a filter) - I think you'd have to grab it from solr's codebase until 2.9 comes out though. No clause limit, and reportedly *much* faster on large indexes. -- - Mark http://www.lucidimagination.com Lebiram wrote: > Hi Erick > > The query was a test data basically in anticipation of searches on all > indices (4 index) with 12 million docs > that should yield very small results. Obviously that query does not happen in > real life but it did break the system. > If some user thought of just inputting random words then the system will be > brought to its knees and eventually die. > > Essentially, all our lucene index has about 8 fields; 1 field is being used > as a filter (timestamp) > the rest are normal fields which can accept wildcards. > > You have a point in Filters being useful for a few other fields we do have. > I'll apply that. > So that leaves about 5 fields that allows fuzzy search. > > Which goes back to the max clause problem. Lucene's default Max Clause is > 1024, is there any reason behind this max? > > Thanks, > > M > > > ________________________________ > From: Erick Erickson <erickerick...@gmail.com> > To: java-user@lucene.apache.org > Sent: Thursday, April 2, 2009 2:34:47 PM > Subject: Re: Search using MultiSearcher generates OOM on a 1GB total > Partitioned indeces > > Ah, I get it now. Given that you bumped your max clause up, it makes > sense. I'm pretty sure that the wildcard expansion is the root or your > memory problems. The folks on the list helped me out a lot understanding > what wildcards were about, see the thread titled "I just don't get wildcards > at all" in the searchable archives from several years ago... > > Why do you want to generate queries of the form you showed? I'm > wondering if this is an XY problem and if you gave us a higher level > description of the problem you're trying to solve we'd be able to > suggest other approaches. I have a really hard time imagining a use > case where a user is well served by a clause that says > "any document that has word beginning with g and h and d and s.....", > so I'm assuming you're trying to solve something specific to your > domain..... > > But if you really, truly do require this form, consider Filters. If your > problem really requires single-letter starts, consider creating 26 > Filters at start up time and use those (see ConstantScoreQuery) > That'll chew up about 1.5M each of memory, faaaaar less than > you're consuming presently and will be blazingly fast. If you're > not limited to single-characters, *still* consider filters. They'll > consume little memory and are quite speedy to construct. > > Best > Erick > > > On Thu, Apr 2, 2009 at 5:04 AM, Lebiram <lebi...@ymail.com> wrote: > > >> Hi Erick, >> >> I did a search just as JVM started... so I'm thinking that the JVM is busy >> with some startup stuff... and that this search required more memory than >> what is available at that time. >> >> Had I done this search a while after the JVM has started, then this query >> succeeds. >> I then pump in several similar queries running on a different thread and it >> takes a long time but still runs to completion until one of them generates >> OOM.But still, queries like this is just using too much memory. >> >> As for clauses, the BooleanQuery was set to max clause of... 9,000,000 >> I'm guessing that might have caused the usage of too much memory? >> >> I'll try the explain on you've suggested. >> >> Thanks, >> >> M >> >> >> >> >> ________________________________ >> From: Erick Erickson <erickerick...@gmail.com> >> To: java-user@lucene.apache.org >> Sent: Wednesday, April 1, 2009 6:51:13 PM >> Subject: Re: Search using MultiSearcher generates OOM on a 1GB total >> Partitioned indeces >> >> Think about putting this query in Luke and doing an "explain" for details, >> but.... >> >> I'm surprised this is working at all without throwing TooManyClauses >> errors. >> Under the covers, Lucene expands your wildcards to all terms in the field >> that match. For instance, assume your document field has the following: >> aa >> ab >> ac >> ad >> ae >> >> Now, searching for a* produces a clause like: >> (aa OR ab OR ac OR ad OR ae) in place of the a* >> >> So your query is generating ginormous OR clauses, one that >> contains every term in your content field starting with 'g'. Another >> with every term in your content field starting with 'h' etc. So I suspect >> that your content field doesn't have very many distinct terms in it.... >> >> As for why it's returning few entries, what does this part of your >> query return by itself? Since it's anded with your wildcard query, >> it might be what's limiting your results. >> >> ((+sender:cpuser9 +viewers:cpuser4) (+sender:cpuser4 +viewers:cpuser9) >> (+viewers:cpuser9 +viewers:cpuser4)) >> >> But I'm puzzled, because saying that you're getting OOM errors >> doesn't square very well with getting *any* results at all, so is >> there something else going on? >> >> Best >> er...@morequestionsthananswers. >> >> >> On Wed, Apr 1, 2009 at 1:31 PM, Lebiram <lebi...@ymail.com> wrote: >> >> >>> Hi All, >>> >>> I have the following query on a 1GB index with about 12 million docs : >>> As you can see the terms consist of wildcards... >>> >>> query.toString()=+(+content:g* +content:h* +content:d* +content:s* >>> +content:a* +content:w* +content:b* +content:c* +content:m* +content:e*) >>> +((+sender:cpuser9 +viewers:cpuser4) (+sender:cpuser4 +viewers:cpuser9) >>> (+viewers:cpuser9 +viewers:cpuser4)) >>> >>> The Searcher is a MultiSearcher with 4 IndexSearchers pointing to 4 >>> different Lucene Index. >>> This search returns very few records, several ten thousand hits. >>> >>> Java is assigned with 1GB max memory. >>> >>> Somehow this search eats the entire java heap space and causes OOM. >>> Looking at jProfiler, i see org.apache.lucene package generating a lot of >>> objects which I believe is taking all this space. >>> >>> Can anyone explain the reason why this particular search might take so >>> >> much >> >>> memory? >>> Is there anything I am doing wrong here? >>> More importantly, is there anything I can do to improve this? >>> >>> -M >>> >>> >>> >>> >> >> >> >> > > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org