What kind of queries are these? I.e. How much work goes into step 4? Is this a fairly standard combination of Boolean/Phrase/other stock Lucene queries built up out of tokenizing the text?
If so, it's going to be nowhere near the bottleneck in your runtime (we're talking often way less than a millisecond per query), and so you can save this to the last minute. Doing the key/value store lookup (especially if remote!) I can see caching, but producing the Lucene query is only slow if you're doing some *really* crazy stuff. Sometimes happens, to be sure, but usually, the crazier your query gets, the slower it gets (step 5), and so even in this case, the bottleneck is not in the Query object creation. It happens once per query. Step 5 has lots of steps which happen once per *document* which matches the query (inner loop versus before the loop starts). -jake On Wed, Dec 2, 2009 at 5:43 PM, Erdinc Yilmazel <erd...@yilmazel.com> wrote: > Hi, > > In my application certain kind of queries for certain kinds of inputs will > be repeated on the lucene index. The application flow is something like > this: > > 1. Get input A > 2. Lookup a key/value store for key A > 3. Load a text from key value store to be used as a query > 4. Analyze the text and build a Query object > 5. Perform a search > > What I want to do is to implement a cache for the steps 2, 3 and 4. I don't > want to analyze the query text again and again. Think of this as a > distributed application, running on several servers. What is the best way > to > cache analyzed version of the input text? I can make a cache per JVM by > holding a previously created Query object for a specific input, but in a > distributed environment if I store the serialized form of Query object, the > overhead of deserializing may kill all the benefits of caching here... > > Thanks, > Erdinc >