Re: [Gnumed-devel] Approaches to maintain clinical data uptime

Tim Churches Sat, 29 Apr 2006 16:06:43 -0700

James Busser wrote:
> On Apr 29, 2006, at 4:35 AM, Tim Churches wrote:
> 
>> (I keep wondering whether we should have used an EAV pattern for storage
> 
> Educated myself (just a bit) here
> 
> http://www.health-itworld.com/newsitems/2006/march/03-22-06-news-hitw-dynamic-data
> 
> http://www.pubmedcentral.gov/articlerender.fcgi?artid=61439
> https://tspace.library.utoronto.ca/handle/1807/4677
> http://www.jamia.org/cgi/content/abstract/7/5/475


Thanks - we have copies of the latter three papers but I hadn't seen the
first article. Of course, PostGreSQL muddies the waters, because the way
it works under the bonnet (hood, engine cover) is rather similar to (but
not identical) to the EAV model - but all that is hidden behind the SQL
interface which is not easy to bypass.

We really wanted to use openEHR when we started in 2003 - openEHR can
been seen as a very sophisticated metadata layer which can be used with
an EAV-like back-end storage schema - but no openEHR storage engines
were available then, and when I asked again earlier this year, there
were still none available (as open source or closed source on a
commercial basis) in a production-ready form.

Anyway, plain old PostgreSQL tables work rather well, and are fast and
reliable for large datasets - but we will need to build our own
replication engine, I now think. What we really need is multi-master DB
replication which can cope with slow and unreliable networks (hence it
has to use asyncrhonous updates, not tightly-coupled synchronous updates
such as multi-phase commits) and with frequent "network partition". If
we are funded to do that, then we'll write it in Python, probably using
a stochastic "epidemic" model for the data propagation algorithm and
some variation on Lamport logical clocks for data synchronisation. It
als needs to propagate schema changes. Hopefully if we can make it
sufficiently general so it might have utility for GNUmed eg when a copy
of a clinic database is taken away on a laptop for use in the field eg
at a nursing home or a satellite clinic, and network connection and
synchronisation only occurs occasionally. However, we need the
replication to scale to 200 to 300 sites. Interestingly, most of the
commercial multi-master database replication products just gloss over
the issue of data integrity, or leave it up to the application - but
research in the 1990s showed that that is not good enough in more
complex situations with more than a few master DB instances.

>> - Slony would have worked with that..).

There is a Slony-2 project, being done here in Sydney, but it is
focussing on multi-master synchronous updates ie multiple servers in a
single data centre, for load-balancing of write tasks as well as read
tasks (for which Slony-1 can be used to facilitate load-balancing)

Sorry to rave on, but don't let anyone tell you that there are some
fundamental data management issues yet to be addressed by open source or
commercial software.

Tim C




_______________________________________________
Gnumed-devel mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/gnumed-devel

Re: [Gnumed-devel] Approaches to maintain clinical data uptime

Reply via email to