On 30 Jun 2011, at 11:42, Simon Slavin wrote:

> 
> On 30 Jun 2011, at 7:28pm, Simon Gornall wrote:
> 
>> I don't care if an update is made to the *value* of 'otherColumnName'. I 
>> only care if the set-of-objects-that-would-be-returned could differ, not the 
>> properties of those objects.
> 
> Do a
> 
> SELECT group_concat(rowid) WHERE …
> 
> when you do your original fetch and when you want to do the check, and 
> compare the two results.  The answer returned is just one long string so all 
> you need is a routine to compare the two strings and see if they're equal.
> 
> SQLite is extremely efficient at this stuff, and you're unlikely to be able 
> to improve on it for your purposes without writing your own database system 
> from the ground up.

I'm not really sure I'm explaining myself well here. Let me try again:

I've got models (basically object-instances of a database row) and entities 
(handle database operations + misc admin stuff for a table). The entity has a 
cache of all the models that have previously been selected from its table.

When I perform a query, the following happens:

        - create a prepared statement or fetch from the statement-cache if 
possible
        - execute statement with arguments
        - step over each row returned:
                - does the uuid (a standard column) match one in our cache ?
                        - yes: add the model from the cache to the 
set-of-results
                        - no: create a new object, add to the cache, add object 
to set-of-results
        - return set-of-results


What I've just added is a cache-of-results associated with a hash of (the 
statement and the arguments to that statement) within the entity. Now the above 
sequence is modified to:

        - create a prepared statement or fetch from the statement-cache if 
possible
        - check if cached-result already available
                - yes: return cached-result
                - no: execute statement with arguments
                        - step over each row returned:
                                - does the uuid (a standard column) match one 
in our cache ?
                                        - yes: add the model from the cache to 
the set-of-results
                                        - no: create a new object, add to the 
cache, add object to set-of-results
                        - cache set-of-results as cached-result
                        - return set-of-results

Running some simple timings (fetch 1000 objects based on the same query 
criteria each time), I have:

Test Case '-[testResultCache testResultsCache]' started.
1st pass (no caches)     : (1000 models) 0.053308
2nd pass (Q cached)      : (1000 models) 0.001687
3rd pass (Q,R cached)    : (1000 models) 0.000019
4th pass (Q cached)      : (1000 models) 0.002163
Test Case '-[testResultCache testResultsCache]' passed (0.074 seconds).

In this case, the passes are as follows:

        - 1st pass      : nothing in cache, create all objects afresh, return 
set
        - 2nd pass      : prepared statement cached, objects all cached
        - 3rd pass      : prepared statement cached, objects all cached, 
result-set cached
        - 4th pass      : change made to an object therefore need to update 
that object, discard result-set and then same as 2nd-pass

So, the goal is to maximise the number of times the 3rd-pass  style can be what 
is actually being used. At the moment, if an entity gets *any* write operation, 
I discard *all* entries in the result-cache for that object. What I'm trying to 
do is only discard the entries in the cache (when the UPDATE takes place) that 
are affected by that update. For that I need to know which columns the query is 
dependent on (I already know the columns affected by the update). 

I'm not trying to replace/re-implement SQLite, and I'm sure it's very 
efficient. What I'm doing is simply avoiding calling it when I already have the 
answer I need. What I'm trying to do is optimise the decision process for when 
I can use those answers I already have. You can see in the above that the 
timings are significantly faster for the 3rd pass. The application can do a 
*lot* of queries :)

If it's not possible, then fine. I'll just modify the API to allow the clients 
to optionally define the columns that a given query is dependent on. If they 
provide that information, I can be clever about discarding cache entries. If 
they don't I'll just discard everything as I currently do on an UPDATE.  I was 
just looking to see if there was a way to figure out the column dependencies 
via inspection.

Simon.

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to