Re: [ADVANCED-DOTNET] MiddleTier Caching was Abstracting SQL

Frans Bouma Tue, 12 Oct 2004 02:08:48 -0700

> -----Original Message-----
> From: Ben Kloosterman [mailto:[EMAIL PROTECTED] 
> > In our case we must implement Object Query - long story, business 
> > requerement. However, I see these two ideas complementing each
other; 
> > there is no conflict.
> [Ben Kloosterman]
> You can use them as you can get some of the data from the 
> cache . Often O/R layers become much simpler.


        no, it gets more complicated. To be blunt: caching data is a
tedious task which often goes wrong. Also the purpose of caching is
misunderstood. My dreaded caching-is-hard-example: say you have 50
customer objects in the cache. The application wants to load all
customers who bought a product X. This will always cause a database
query as you can't rely on the in-memory cache if all customer objects
available in the database are actually IN the in-memory cache. So this
will first load the customer data from the db, then you have to update
the cache in-memory with the loaded data and return updated objects from
the cache and new ones not in the cache. 

        This has to be done with every query on the data except PK
fetches, which can first check the cache. However, even then, it can
become a nightmare, as the logic doesn't know if the data is from teh
cache or from the db, so you can run into application state issues which
will cause wrong decisions being made in your logic, because it works
with stale data. (example: desktop app runs on machine A and B. Entity E
is loaded in both instances. E is updated by A. B doesn't see that
update. B's logic again wants E. E is in the cache, so the logic gets
teh cached E. Not good, it misses A's updated data. 

        A cache can therefore only be used to get uniqueing for entity
instances (that is: the data of an entity instance in the persistent
storage loaded into memory). However this goes wrong as well in many
occasions. Say I create a webapp. I store an entity in the viewstate,
which is read from the cache. Now when the page gets a postback, the
appDomain is recycled and the page gets again the same instance from the
viewstate but a different instance from the datalayer. Same data,
different instance. 

        Moving on to the hard part: where to put the cache? And what do
you want to cache? User state or application state? User state is not
that hard, but hardly effective. Application state is more effective but
impossible to do: in a webfarm, where to put the application state
cache? On a separate box? Similar problem occurs when a desktop app is
querying the same database on a lot of desktops. A separate box sounds
appealing, but it requires access security to get a save cache, it
requires connections to get the cached data... hmmm... sounds familiar,
the database also has that. Better yet: the database system also caches
data, in memory, caches query plans (so querying the cache if an object
is there matching some predicate is optimized in the database) and other
nice things making performance as high as possible. Creating your own
cache will likely result in rewriting an engine similar to SqlServer but
then for objects. 

> > What strategy do you use to keep the cache in sync with the
database, 
> > especially in the context of a Winforms App?
> 
> [Ben Kloosterman]
> This is the most interesting bit :-)  It really depends on 
> how much you want to scale - with caching your Mid Tiers are 
> very fast and can often handle hundreds or thousands of users 
> - so often 1 server and a standby is enough .  On a Single 
> Server what I do is this : 
> On a successful DB update , delete or insert, I update the 
> cache.  In fact I even do this with multiple servers when 
> there is 1 per geo graphic location ( Sydney doesn't need to 
> know Melbourne's updates) . 
> 
> For multiple servers per location you have at least 3 strategies . 
> 1. Put time stamps on your records and poll the tables 
> looking for changes .  Data is out of synch by the time of the poll. 
> 2. Send Cache updates to all other servers.  This required 
> the mid tier to know about the other servers. 
> 3. DB table Triggers 

        But why all this overhead? What do you win? You need a lot of
overhead to get everything in sync and for what? To save some
connections to a database? Isn't that rather moot when you need database
polling to get that efficiency gain? 

        I also doubt the 'with caching your Mid Tiers are very fast and
can often handle hundreds or thousands of users - so often 1 server and
a standby is enough' claim. Based on which facts is this claim made? I
gave a simple example which makes caching for performance a farce. (You
also say later on that caching for performance is not the goal, which is
correct as such a goal can't be reached).

> Remember though the goal is not performance ( as you are 
> replacing functions SQL server already does) , the cache 
> insures that the business layers has immediate access to  
> frequent / key information , this allows a good middle tier 
> design which is simple and allows simple DB interactions. 

        This is not true. THe BL always has to consult the only real
repository in the system: the persistent storage to make sure the data
it HAS TO work with is correct. Like the example I gave. You can NEVER
rely on a cache in memory if the data is correct, because right before
the database query is executed another thread could have added a
customer who bought product X. Your query consulting stale data will
miss that customer, which could cause false decisions being made. 

        'Immediate access' is also not that easy. True, storing customer
objects in a hashtable based on their single field PK value is not that
hard and finding them back isn't either. It gets tougher when you want
to get a set of data based on ANY given predicate. Even 'all customers
from 'France'' is faster read from the DB than from a cache, because it
will require an index IN MEMORY on the country field, otherwise you'll
get a linear search in memory through objects, which I have the feeling
is slower than the average RDBMS is able to put on the table.

        The only caching which DOES work are: caching of processed
results and caching of never-changing data. For example a rendered
webcontrol, cached for 1 minute. Not only do you save the database
roundtrips, you also save the processing time. 

        Often, people should think why they want caching in the first
place. To save webserver power because the website gets 500,000 hits per
day? Perhaps a page caching with 1 minute per page will help. Often that
will give much more performance boosts than lowlevel caching with a lot
of overhead.

                Frans.

===================================
This list is hosted by DevelopMentor®  http://www.develop.com
Some .NET courses you may be interested in:

Essential .NET: building applications and components with CSharp
August 30 - September 3, in Los Angeles
http://www.develop.com/courses/edotnet

View archives and manage your subscription(s) at http://discuss.develop.com

Re: [ADVANCED-DOTNET] MiddleTier Caching was Abstracting SQL

Reply via email to