On 14. Jan 2007, at 2:40 , Mark Miller wrote:

First, have you looked at SwarmCache? Cluster aware caching for java...

No, I haven't come across that one. I'll take a look, thanks!
As a matter of fact, we do have a network-wide caching mechanism, so that's what we use.

Second...does it matter that you cannot share the same cache across multiple servers? How about a separate cache on each server? When a request hits a particular server for the first time it builds the filter and caches it. I do a lot of filter caching that way with EHcache.

I currently cache it in each server (just by using a Map<Integer, Filter> type of dumb cache). This works fine, but I'm concerned about production use. The problem is that having the first cache miss is really hurting us already (and you can't avoid the first, as you have to calculate it at least once...) but going through it a second time on a second server is distastrous. In our application there are thousands of concurrent users querying the database interactively, and as it is used over the web, this has to be fast. Really fast. In general I require a sub-second response time. Calculating the filter currently takes anywhere between 0.5 - 40 seconds, depending on the user that makes the query. When he is paging, we probably will just rexecute the search instead of caching that, but we might not end up doing the search on the same Lucene server. Having the delay on the first query is bad. Potentially having it on the following pages, too, is not going to work. We might end up with 10 or more Lucene servers, so ending up on a different server for each page isn't that unlikely.

There must be some way to cache that filter...

My ever recurring thought over the last couple of days...;)

The reasoning I put forth in my first mail lead me to the insight that we cannot cache the filter the way Lucene is implementing them right now. Even when caching the filter, the document ids may change. This would lead to potentially wrong results, because my filter cache is now stale (filtering the wrong documents). This I must avoid, naturally. Lucene is giving me a cache practically for free already, but I find its ever changing document ids a problem. What I'd really like to do is to be able to assign documents ids, and have them stay at that position. The fact that I have to query for a field, that I moreover know to be unique, is a huge bottleneck. Ok, I understand that this isn't an issue for fulltext search and most of the applications we are using it for don't have that id requirement, but having cheap, direct access to the documents in a externally cacheable manner appeals to me. I'm sure there are others that have been bitten by this, too, and I'm willing to invest some time into the implementation. I'm just not familiar enough with the codebase to just begin hacking. Also, of course I might be missing something crucial here, that makes my problem a non-issue (which I would prefer :))

Thanks,

Kay

--
Kay Röpke
http://classdump.org/





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to