Am 07.06.2006 um 16:29 schrieb Stephen Deasey:


I don't have time to try and reproduce the problems from scratch, but
I'd be happy to fix it if there's a couple of concise test cases.

But I was forced to do so... (customer support)...

Well, I have fixed that spurious "timeout waiting for update:.."
problems I was experiencing. The problem was rather trivial and
could be ranked as "omission" in the Ns_CacheSetValueExpires() code
(look for diffs between 1.8 and 1.9 to see the changes).

Another problem was: ns_cache_exists which returned true
for values that were internally expired. I corrected that
as well (see also below).

Stephen, there is an *architectural* problem between the Ns_CacheFindEntry()
call which internally calls ExpireEntry() and Ns_CacheWaitCreateEntry()
which checks the value to non-NULL and waits forever (or for some
time, eventually aborting with timeout).

The API sequence for that is simple:

  call Ns_CacheFindEntry()
    this one will check the entry and eventualy call ExpireEntry()
    which will unset the entry value but NOT delete the entry itself

  call Ns_CacheWaitCreateEntry which will find the entry (as it is
    not deleted) but will wait forever (or timeout) because the entry
    value is empty and nobody is going to set it any more

This is what really happened in my case even before the 1.8 version
of the file, but it was just harder to trigger. In 1.8 it is trivial
to trigger and our app breaks very early there.

Now the question is: how can we fix that?

One solution would be to really delete expired entry in Ns_CacheFindEntry() instead of just ExpireEntry(). This would salvage logic in Ns_CacheWaitCreateEntry().

Another solution would be to add new bit in the Entry structure marking
the structure as expired. Then add new logic in the Ns_CacheWaitCreateEntry()
which would check that bit and act accordingly.

Please tell me what do you think. If possible "as soon as possible"
as I have some very angry customers chasing me. I'm OK with any of
the proposed changes or with any other as well, as long as it fixes
the problem.

What I COULD NOT verify was any memory-related problems as Vlad is
experiencing. I can take the code thru Purify once more but this will
take me another two days of work.

Cheers,
Zoran

Reply via email to