Re: [Components] Design: #10531: ezcCacheStorageFile is inefficient when reading the same cache repeatedly

Tobias Schlitt Fri, 18 Jan 2008 07:45:33 -0800

On 01/18/2008 02:19 PM Derick Rethans wrote:
> On Wed, 16 Jan 2008, Tobias Schlitt wrote:
>> On 01/16/2008 10:43 AM Derick Rethans wrote:
>>> On Tue, 15 Jan 2008, Tobias Schlitt wrote:


>>>> I started a small design doc for issue #10531 (ezcCacheStorageFile is
>>>> inefficient when reading the same cache repeatedly), because it is not as 
>>>> easy
>>>> to solve as desribed in the issue (in terms on consitency).
>>>>
>>>> Please take a look and comment. Find the document attached or in
>>>> Cache/trunk/design-1.4.txt.

>>> Some comments:
>>> - I don't think that "store" should store anything in the memory cache. 
>>>   It wouldn't make much sense because this data is in memory already 
>>>   anyway, and secondly, I don't think you store and restore the same 
>>>   cache data in the same request.

>> That depends. To me this sounds as unusual as restoring one and the same 
>> cache item several times in the same request. Beside that, if you 
>> restored the item once, it is in memory, too. Which would mean that we 
>> do not need this functionality at all.

> No, that's slightly different. If you restore it then it is in the local 
> memory scope - you have to take care of caching yourself then. So it's 
> quite different whether restore stores it in the memory cache itself, or 
> whether you have to do it (like now). For both ways, "store" doesn't 
> really have to store anything in the memory cache. It's fine if it 
> should up there if it's restored once though.

I think we are talking about 2 different things here. The one I started 
with is issue #10531, which requests us to implement a very simple 
mechanism in ezcCacheFileStorage (see the issue and proposed design. In 
this issue the "memory cache" resides the local memory scope of the 
current request. So basically it is where you said "you have to take 
care of caching yourself then".

The second issue is the support of multi-level caching as we talked 
about further below.

>> If we consider the use-case of restoring a cache item multiple times 
>> during a request as valid (which is why we want to implement this 
>> feature), we should also consider that data is requested in one part of 
>> an application, where it is generated and stored in the cache, and that 
>> other parts request this data later again, where it needs to be restored.

>>> - Your proposed memory cache is something specifically implemented for 
>>>   the file storage backend. And the memory cache is only in-memory for 
>>>   the duration of one request. Now that we have memcache and apc caches, 
>>>   wouldn't it make more sense to allow for a fallback cache of some 
>>>   sorts, so that you basically can tie two cache backends together. That 
>>>   would allow a "fast memory, slow disk mechanism" (such as you 
>>>   proposed) but also a "fast in-memory, slower memcached, slow disk 
>>>   mechanism", or a "apc cache cache, and a slow disk mechanism". That'd 
>>>   mean that we'd need a normal memory cache backend too.

>> What you basically propose is to introduce multi-level caching (as it is 
>> e.g. done with processor memory caches). While I generally like the idea 
>> of implementing such a system (for the fun part), I think this will take 
>> a good portion of work to be realized. More about this below. Anyway, 
>> for systems like e.g. eZ Publish this would make some sense to have.

> Yes, that's what I meant - and, I think we should have this at some 
> point. So I'd prefer a design that goes towards this.

I agree with you, that we should consider multi-level caching at some 
point. But I doubt that we will find time for this in 2008.1. Issue 
#10531 talks about something different, basically. The question right 
here is, if we want to support this or not.

>>> - For all in-memory caches (also for apc/memcached backends to some 
>>>   extend) we should have some sort of mechanism the limits the amount of 
>>>   memory to be used. I think APC and Memcached have an internal limit 
>>>   already, but a new in-memory cache does not have easy limits there. 
>>>   Something like an LRU/LFU mechanism f.e.

>> As said before, I like the idea of implementing more complex caching 
>> stuff and especially the strategy algorithms, for the fun part of it. 
>> Anyway, I think this will add much to much complexity, especially for 
>> this memory cache. The current design is simple and does not slow down 
>> the file based caches too much.

> It's already a problem in eZ Publish, where there is no control over how 
> many persistent objects are cached in memory. It is quite some 
> limitation although I think it's solved in the later versions though. I 
> would find this important.

>> If we go for implementing caching strategies like LRU we need to keep 
>> track of more data for each cache item (e.g. the last use time) and need 
>> to implement the selection algoriths, too. The utilized memory of a 
>> cache item is not easily determinable if you design this kind of cache 
>> as a general purpose, multi-level cache. For example: If you restore a 
>> cache item from APC and store it into the memory cache for faster second 
>> access, you have no idea how much memory this consumes. It could be an 
>> array with only an integer element, but also one with millions of 
>> objects (as an ArrayObject then).

> Instead of memory usage, you can of course also limit it to the *amount* 
> of cache items.

Which is pointless as the size of these items can differ widely. Such 
techniques only work with a fixed width of cache cells. However, if we 
go for a multi-level caching design, we need to solve this issue somehow.

>> In addition, for the idea of multi-level caching, you would need to 
>> implement the caching strategies for the APC and Memcache storages, too, 
>> to have this part consistent.

> Hmm, that's not a real necessity... as you can just set options on 
> different caching backends/levels. 

True.

>> In this sense I would say: Keep this stuff simple. If we go for 
>> implementing it, then in a very simple, but efficient way. Else we 
>> should leave it to the user that he does not restore the same item 
>> many times during a request.

> I still think it's imperative that the amount of memory/cache items can 
> be limited in any sort of in-process memory cache.

We could of course go for such an approach. I still think it will 
overcomplicate the solution for issue #10531, because you need take care 
of much more things if you want to do this. For example we need to keep 
track of which item was used when to implement LRU. In case of reaching 
the maximum number of cached items we either need to run through all of 
them to find the least recently used and purge it or we need to store 
this data really efficiently beforehand.

As said above, I think multi-level caching is sensible. I also think it 
could be fun to design and implement it (while our current cache 
component does not offer much to support it, yet). However I won't have 
the time to do it for 2008.1 and I doubt any of us has. :/

Regards,
Toby
-- 
Mit freundlichen Grüßen / Med vennlig hilsen / With kind regards

Tobias Schlitt (GPG: 0xC462BC14) eZ Components Developer

[EMAIL PROTECTED] | eZ Systems AS | ez.no
-- 
Components mailing list
[email protected]
http://lists.ez.no/mailman/listinfo/components

Re: [Components] Design: #10531: ezcCacheStorageFile is inefficient when reading the same cache repeatedly

Reply via email to