James Busser wrote: > On Apr 29, 2006, at 4:35 AM, Tim Churches wrote: > >> (I keep wondering whether we should have used an EAV pattern for storage > > Educated myself (just a bit) here > > http://www.health-itworld.com/newsitems/2006/march/03-22-06-news-hitw-dynamic-data > > http://www.pubmedcentral.gov/articlerender.fcgi?artid=61439 > https://tspace.library.utoronto.ca/handle/1807/4677 > http://www.jamia.org/cgi/content/abstract/7/5/475
Thanks - we have copies of the latter three papers but I hadn't seen the first article. Of course, PostGreSQL muddies the waters, because the way it works under the bonnet (hood, engine cover) is rather similar to (but not identical) to the EAV model - but all that is hidden behind the SQL interface which is not easy to bypass. We really wanted to use openEHR when we started in 2003 - openEHR can been seen as a very sophisticated metadata layer which can be used with an EAV-like back-end storage schema - but no openEHR storage engines were available then, and when I asked again earlier this year, there were still none available (as open source or closed source on a commercial basis) in a production-ready form. Anyway, plain old PostgreSQL tables work rather well, and are fast and reliable for large datasets - but we will need to build our own replication engine, I now think. What we really need is multi-master DB replication which can cope with slow and unreliable networks (hence it has to use asyncrhonous updates, not tightly-coupled synchronous updates such as multi-phase commits) and with frequent "network partition". If we are funded to do that, then we'll write it in Python, probably using a stochastic "epidemic" model for the data propagation algorithm and some variation on Lamport logical clocks for data synchronisation. It als needs to propagate schema changes. Hopefully if we can make it sufficiently general so it might have utility for GNUmed eg when a copy of a clinic database is taken away on a laptop for use in the field eg at a nursing home or a satellite clinic, and network connection and synchronisation only occurs occasionally. However, we need the replication to scale to 200 to 300 sites. Interestingly, most of the commercial multi-master database replication products just gloss over the issue of data integrity, or leave it up to the application - but research in the 1990s showed that that is not good enough in more complex situations with more than a few master DB instances. >> - Slony would have worked with that..). There is a Slony-2 project, being done here in Sydney, but it is focussing on multi-master synchronous updates ie multiple servers in a single data centre, for load-balancing of write tasks as well as read tasks (for which Slony-1 can be used to facilitate load-balancing) Sorry to rave on, but don't let anyone tell you that there are some fundamental data management issues yet to be addressed by open source or commercial software. Tim C _______________________________________________ Gnumed-devel mailing list [email protected] http://lists.gnu.org/mailman/listinfo/gnumed-devel
