Andrea Aime ha scritto: > Jody Garnett ha scritto: >> Andrea Aime wrote: >>>> becomes non 0 when it is replaced (so revision==0 always represents >>>> the "live" data). Having two columns is not bad, does having both >>>> help you ask for data in a specific range? Or could we get by with >>>> just a single column. >>> I'm doing performance tests now, to see how much performance I give >>> up by using my schema, especially on extracting the last revision, which >>> is the most common operation anyways. >> Let's assume then you will measure and make the best choice - and move >> on to another topic. >
Oh well, it just occurred to me that a single column design requires you to store in the version column either the fact that the row is the last one, or store the version it has expired, since when creating a new version you don't have any idea when the row was created. An alternative would be to use a negative number for the last revision, so that you can simply change its sign when it expires and have the revision created stored. Anyways, let's assume the first design for a second, since it allows me to do a quick check directly against my current schema: it's just like working with the expire column alone. Extracting the last revision is exactly the same query, I only use expired anyways. What makes the difference is extracting a specific revision, somethign which is used by rollback, diff and eventually in the future by branch merge. Extracting revision 2000 on my test data set requires the following query: select * from testdata td1 where td1.expired = ( select min(expired) from testdata td2 where td2.expired > 2000 and td2.id = td1.id) I've inserted this into my benchmark, with the same rules as the other queries (that is, fully scan the result) and... I lost patience after a few _minutes_ is was running and killed it (and had to kill postgres as well since it was still processing the query after I closed the connection), so I can only tell you it's at least two orders of magnitude slower than the one using two columns... And it's not because of the lack of indexes, since (expired, id) is indexed, but because the subquery forces a nested loop on all rows in the table... I don't know if there's a better way to express the same query, I'm still a bit sleepy... If you find a better one, let me know. Cheers Andrea ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Geotools-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/geotools-devel
