Andrea Aime ha scritto:
> Jody Garnett ha scritto:
>> Andrea Aime wrote:
>>>> becomes non 0 when it is replaced (so revision==0 always represents 
>>>> the "live" data). Having two columns is not bad, does having both 
>>>> help you ask for data in a specific range? Or could we get by with 
>>>> just a single column.
>>> I'm doing performance tests now, to see how much performance I give
>>> up by using my schema, especially on extracting the last revision, which
>>> is the most common operation anyways.
>> Let's assume then you will measure and make the best choice - and move 
>> on to another topic.
> 

Oh well, it just occurred to me that a single column design requires
you to store in the version column either the fact that the row
is the last one, or store the version it has expired, since when
creating a new version you don't have any idea when the row was
created.
An alternative would be to use a negative number for the last revision,
so that you can simply change its sign when it expires and have
the revision created stored.

Anyways, let's assume the first design for a second, since it allows
me to do a quick check directly against my current schema: it's just
like working with the expire column alone.

Extracting the last revision is exactly the same query, I only use
expired anyways. What makes the difference is extracting a specific
revision, somethign which is used by rollback, diff and eventually
in the future by branch merge. Extracting revision 2000 on my
test data set requires the following query:

select *
from testdata td1
where td1.expired = (
   select min(expired)
   from testdata td2
   where td2.expired > 2000
   and td2.id = td1.id)

I've inserted this into my benchmark, with the same rules as the
other queries (that is, fully scan the result) and... I lost patience
after a few _minutes_ is was running and killed it (and had to kill
postgres as well since it was still processing the query after I closed 
the connection), so I can only tell you it's at least two orders of 
magnitude slower than the one using two columns...

And it's not because of the lack of indexes, since (expired, id)
is indexed, but because the subquery forces a nested loop on all rows
in the table... I don't know if there's a better way to express the
same query, I'm still a bit sleepy...
If you find a better one, let me know.

Cheers
Andrea


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Geotools-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/geotools-devel

Reply via email to