Hi Frans , 

> 
>         no, it gets more complicated. To be blunt: caching data is a
> tedious task which often goes wrong. Also the purpose of caching is
> misunderstood. My dreaded caching-is-hard-example: say you have 50
> customer objects in the cache. The application wants to load all
> customers who bought a product X. This will always cause a database
> query as you can't rely on the in-memory cache if all customer objects
> available in the database are actually IN the in-memory cache. So this
> will first load the customer data from the db, then you have to update
> the cache in-memory with the loaded data and return updated objects
from
> the cache and new ones not in the cache.
[Ben Kloosterman] 

I disagree ... In most projects you will find a lot of tables that can
be completely cached. Ie Products ..  There is no reason to Cache all
tables. By having most tables cached your remaining queries become much
simpler as you have partial information already . 

Many to Many relationships to a large table like you mentioned will
always require a DB request . But the mid Tier will just pass this
through to the DB .  Note though that even that request would be simpler
as there would be no need to do a join. 

ProductID = ProductsCollection["ProductX"].ProductID 

Populate the transactions via ... 
Select DISTINCT * from TransactionTable where Product = ProductID 

Now your choice varies 
- if your customer DB is huge you would have to use a join 
- If your Customer DB is less than 50,000  than the entire DB can be in
memory  than you would simple do the following . 
- You could show only customer active in the last week which are always
cached. 

CustCollection cust = new custCollection(); 
Foreach( int customerID in result )
        cust.Add(GlobalCustCollection[customerID))

Basically Caching does not prevent you going back to more complex SQL
but in most cases it allows much simpler SQl . 

Compare 

Select DISTINCT * from TransactionTable where Product = ProductID

Vs a Hoin on TransactTable , Products and Customers. 

> 
>         This has to be done with every query on the data except PK
> fetches, which can first check the cache. However, even then, it can
> become a nightmare, as the logic doesn't know if the data is from teh
> cache or from the db, so you can run into application state issues
which
> will cause wrong decisions being made in your logic, because it works
> with stale data. (example: desktop app runs on machine A and B. Entity
E
> is loaded in both instances. E is updated by A. B doesn't see that
> update. B's logic again wants E. E is in the cache, so the logic gets
> teh cached E. Not good, it misses A's updated data.
[Ben Kloosterman] 

This is correct which is why in most cases your Cache has all the data
or has the last x days. That avoids such issues entirely. 

> 
>         A cache can therefore only be used to get uniqueing for entity
> instances (that is: the data of an entity instance in the persistent
> storage loaded into memory). However this goes wrong as well in many
> occasions. Say I create a webapp. I store an entity in the viewstate,
> which is read from the cache. Now when the page gets a postback, the
> appDomain is recycled and the page gets again the same instance from
the
> viewstate but a different instance from the datalayer. Same data,
> different instance.
[Ben Kloosterman] 
In most cases Web servers have there own cache specifically for this
purpose. You would not use the Mid Tier cache and add the extra hop . 

The same issue also occurs when you write direct to the DB eg some
product prices may be updated. In this case the web server cache and DB
are out of Synch, this is the same as the MidTier Cache and WebServer
Cache are out of Synch. 

> 
>         Moving on to the hard part: where to put the cache? And what
do
> you want to cache? User state or application state? User state is not
> that hard, but hardly effective. Application state is more effective
but
> impossible to do: in a webfarm, where to put the application state
> cache? 
[Ben Kloosterman] 
Different caches have different purposes. Web server caches are used on
each web server provided you post back to the same server. If not your
UI does not have caching . 

In large farms you should have your Business logic and other systems
interfaces in a separate layer. This is where you cache certain tables. 

On a separate box? Similar problem occurs when a desktop app is
> querying the same database on a lot of desktops. A separate box sounds
> appealing, but it requires access security to get a save cache, it
> requires connections to get the cached data... hmmm... sounds
familiar,
> the database also has that. Better yet: the database system also
caches
> data, in memory, caches query plans (so querying the cache if an
object
> is there matching some predicate is optimized in the database) and
other
> nice things making performance as high as possible. Creating your own
> cache will likely result in rewriting an engine similar to SqlServer
but
> then for objects.
[Ben Kloosterman] 
Yes that is the purpose except 
        - You don't have complicated  , difficult to maintain SQL and
associated Bugs . 
        - You don't have 1 idiot or report bringing a whole DB down.
        - You don't have a DB where one app where everyone has the same
ID changing things and you can't track it. 
        - You don't have to use high end UNIX boxes. 
        - If you have performance problems you can add machines and get
a benefit... 
        - You don't have the latest Equities Sales Director doing Excel
Cross joins on your tables etc . 

Yet you still retain all the advantages of robust persistence , backups
etc .

I think the US Department of Defence has mandated that 2Tiers not be
used for more than 300 simultaneous users.. This is  a good choice in my
experience  , web sites tend to be pretty light . 

> 
> > > What strategy do you use to keep the cache in sync with the
> database,
> > > especially in the context of a Winforms App?
> >
> > [Ben Kloosterman]
> > This is the most interesting bit :-)  It really depends on
> > how much you want to scale - with caching your Mid Tiers are
> > very fast and can often handle hundreds or thousands of users
> > - so often 1 server and a standby is enough .  On a Single
> > Server what I do is this :
> > On a successful DB update , delete or insert, I update the
> > cache.  In fact I even do this with multiple servers when
> > there is 1 per geo graphic location ( Sydney doesn't need to
> > know Melbourne's updates) .
> >
> > For multiple servers per location you have at least 3 strategies .
> > 1. Put time stamps on your records and poll the tables
> > looking for changes .  Data is out of synch by the time of the poll.
> > 2. Send Cache updates to all other servers.  This required
> > the mid tier to know about the other servers.
> > 3. DB table Triggers
> 
>         But why all this overhead? What do you win? You need a lot of
> overhead to get everything in sync and for what? 
[Ben Kloosterman] 
Actually no real performance overhead , but a lot of code. Further more
in most sites you simply have 1 mid tier server and no overhead . 

To save some
> connections to a database? Isn't that rather moot when you need
database
> polling to get that efficiency gain?
[Ben Kloosterman] 
Only in case 1. And a simple Select once a minute is not important ... 
> 
>         I also doubt the 'with caching your Mid Tiers are very fast
and
> can often handle hundreds or thousands of users - so often 1 server
and
> a standby is enough' claim. Based on which facts is this claim made? I
> gave a simple example which makes caching for performance a farce.
(You
> also say later on that caching for performance is not the goal, which
is
> correct as such a goal can't be reached).
[Ben Kloosterman] 

Remember the goal is not to cache the huge transactional tables. The
goal is to cache frequently used tables completely Products, Customers
etc and leave Transaction tables to SQL . 

In fact in most of my designs caching is a misnomer the data is an in
memory collection and the data is merely persisted to the Db to handle a
crash. 

> 
> > Remember though the goal is not performance ( as you are
> > replacing functions SQL server already does) , the cache
> > insures that the business layers has immediate access to
> > frequent / key information , this allows a good middle tier
> > design which is simple and allows simple DB interactions.
> 
>         This is not true. THe BL always has to consult the only real
> repository in the system: the persistent storage to make sure the data
> it HAS TO work with is correct. Like the example I gave. You can NEVER
> rely on a cache in memory if the data is correct, because right before
> the database query is executed another thread could have added a
> customer who bought product X. 
[Ben Kloosterman] 
You can lock in memory collections... 

Your query consulting stale data will
> miss that customer, which could cause false decisions being made.
> 
>         'Immediate access' is also not that easy. True, storing
customer
> objects in a hashtable based on their single field PK value is not
that
> hard and finding them back isn't either.
>  
>
 [Ben Kloosterman] >It gets tougher when you want
> to get a set of data based on ANY given predicate. Even 'all customers
> from 'France'' is faster read from the DB than from a cache, because
it
> will require an index IN MEMORY on the country field, otherwise you'll
> get a linear search in memory through objects, which I have the
feeling
> is slower than the average RDBMS is able to put on the table.
[Ben Kloosterman] 
Nothing wrong with having an in memory index if you design show this is
frequently needed.  In most cases you would simply have a collections
which point to frequently used fields that is good oo design . 
> 
>         The only caching which DOES work are: caching of processed
> results and caching of never-changing data.
[Ben Kloosterman] 
I would say infrequently changing data ... In most DB's I look up you
find 10% of tables frequently used the rest are almost static.

>  For example a rendered
> webcontrol, cached for 1 minute. Not only do you save the database
> roundtrips, you also save the processing time.
> 
>         Often, people should think why they want caching in the first
> place. To save webserver power because the website gets 500,000 hits
per
> day? Perhaps a page caching with 1 minute per page will help. Often
that
> will give much more performance boosts than lowlevel caching with a
lot
> of overhead.

[Ben Kloosterman] 
Interesting to read your comments and most are quite valid. It looks
like you are mainly looking at things from a web site perspective and
for small - medium web sites I see little benefit in middle Tier caching
- esp since you use front end caching which is shared amongst 1000's of
users anyway  . 

On the other hand if you are designing a system to handle 2000+  Winform
users  , you have other systems like Excel hitting your DB  , new sales
reports etc , You have to export and import data from other systems than
things quickly change...  

I will say this if you requirements that require a lot flexibility -
don't cache these requirement. I often design systems where reports go 2
tier to the Db or even a replicated DB . And everyone complains about
maintaining the report code ... 


Regards , 

Ben 


> 
>                 Frans.
> 
> ===================================
> This list is hosted by DevelopMentor(r)  http://www.develop.com
> Some .NET courses you may be interested in:
> 
> Essential .NET: building applications and components with CSharp
> August 30 - September 3, in Los Angeles
> http://www.develop.com/courses/edotnet
> 
> View archives and manage your subscription(s) at
> http://discuss.develop.com

===================================
This list is hosted by DevelopMentor�  http://www.develop.com
Some .NET courses you may be interested in:

Essential .NET: building applications and components with CSharp
August 30 - September 3, in Los Angeles
http://www.develop.com/courses/edotnet

View archives and manage your subscription(s) at http://discuss.develop.com

Reply via email to