On 6/11/2010 5:58 AM, Joe Orton wrote:
> On Thu, Jun 10, 2010 at 07:28:55PM -0500, William Rowe wrote:
>> Have not checked specifically, the problem I observed was ejecting other
>> unexpired elts as other records were repeatedly updated.
>>
>> It seems the simple fix is to change the pre-store free logic,
>>
>>  * expire records [do this only if space needed?  right now, it's on all 
>> stores]
>>
>>  * if enospace, compress removed/expired records until space is available
>>
>>  * if still enospace, drop the first-to-expire [current behavior]
> 
> Looking at this... currently the code assumes that expiry times of the 
> objects are in time order, which is rather a leap of faith now.  It does 
> allow expiry to be extremely cheap though; you just have to move the 
> index pointers to the first still-valid entry.

Right - the expires shouldn't have to be in sequence, probably should be
presumed not to be in sequence.  However, in order to salvage space, we
need to move some bytes on occassion.  I'm thinking in terms of compressing
the most recent space to gain a hole, until we ultimately compress the
entire block.  If we keep these of 'managable size' (yes, define manageable)
then the impact should not be that great.

> I'd start to get rather worried about the performance impact of doing 
> anything more complicated, esp if by "compress" you mean starting to 
> actually moving data records about.

That's why I suggested a greater number of subcaches.  We need solid holes
if we are going to maintain the usability of the subcache.  But there is
no sense in compressing spaces we don't need 'now'.

>> Seem sensible?
>>
>>> w.r.t. contract: specific behaviour will vary across the providers; 
>>> could nail down the ->retrieve API with a bunch of stuff which is *not* 
>>> guaranteed, I suppose, is that what you're thinking of?  "object 
>>> returned may be out-of-date/stale, etc"
>>
>> If we know it may be stale, we should declare that (in the shmcb case, this
>> never happens, IIRC).  We obviously should declare this is a lossy cache.
> 
> Yeah, I was thinking of memcache w.r.t. staleness.  "lossy cache" is a 
> tautology!

:)

>> In other events, the current logic of max 256 caches keyed on the first byte
>> of ID is not sufficiently resuable.  What do folks suggest for the most 
>> efficient
>> hash method?
>>
>> I'm thinking we add to the hints that hashing is required, or assuring that
>> the data is 8-bit hashed already and leading bytes are just fine for choosing
>> subcaches.  In order to achieve the goal above of compressing records, it 
>> would
>> seem sensible to favor more subcaches with no more than 64k worth of elts, 
>> which
>> ensures any compressing memcpy's are more efficient.
> 
> Yes, this is a great point, and adding a "ids not random" hint seems 
> like a really excellent idea.
> 
> Could we just use apr_hashfunc_default() as the hash and XOR the bytes 
> of the returned int down to a single byte rather than import something 
> else?

Well, two bytes (if ) - but sure, apr_hashfunc_default() is fine.  That's 
sensible,
I'll work something up now.

Reply via email to