Re: [Gnumed-devel] catching errors in an app

Karsten Hilbert Sat, 18 Feb 2006 13:56:06 -0800

On Fri, Feb 17, 2006 at 11:59:51PM -0800, Jim Busser wrote:

> >Ha ! Do you think we should add check constraints on some
> >such fields ? IOW "unlimited" length is fine but no CR/LF/FF ?
> 
> Not sure. Some postal systems do have strict requirements that 
> specify, for each address "part", the line on which the part should 
> fall as well as the sequence within a line. So that purpose would be 
> served by such a constraint.
We need to be very careful here not to jump to invalid
conclusions. The postal system constraints you mention have
nothing to do with storing the data itself. They are simply
one way (and in the case of an envelope possibly the only
legal way in some situations) to *present* the data.


The only conclusion I can draw from postal systems that
touch upon data storage are how finely grained I will have
to break apart data in the database. Suppose this contrived
example:

German postal system says: "ALWAYS put zip and town onto the
same line like so: '04318 Leipzig'".

Canadian postal system says: "Always put town onto one line
and zip code onto the next line like so:

'Calgary,
 132415'"

>From this I can draw the conclusion that I must store zip
code and town name in separate fields - even though I am a
German programmer and a single field would do for my
requirements.

How I then put this onto an envelope is by putting things
back together according to locale specific rules.

The backend must assure that I *can* put things back
together. It is not prudent to have things *already* put
together in the backend because one postal systems requires
it that way and we forgot to think of other systems.

This whole business is why databases should be "normalized".

The *real* question I was posing above was actually:

Do people think that a field called "street" or "town" is
sufficiently fine-grained for *any* postal system to be able
to say: This field is *never* supposed to contain any LF/CR ?

My intuition would say, there is not any country which
requires us to support multi-line street names etc. However,
I was asking you guys to reinforce my suspicion with your
experience before I put this constraint into the database
itself.

> My biggest question/concern probably has to do with importing, for 
> example porting data over from a legacy system. When importing data 
> that would normally violate a constraint, would the owner have all of 
> the following options and are any inherent in postgres or would they 
> have to be especially programmed?
inherent AND programmed, depending on how one goes about it

> 1. when importing patients' demographics, can just the "bad" 
> records/lines be streamed to some kind of log file while accepting 
> the "good" records, or must it be "all or none"?
Either or, depending on how the importer is written.

My gut feeling in writing an importer for demographics would
be to accept/refuse at the entire-patient level, eg import
good patients, reject and log bad ones with a notice as to
what the error was.

> I recognize that for 
> whoever would have to fix the "bad" lines, it may be undesirable to 
> second-stage import "just" the bad lines that have been fixed.
Whether it's undesirable or impractical or hard depends on
the level of reject. For import of demographics it's
probably best to go patient by patient. The logic would then
be: either a patient is there in which case a previous
demographics import fully succeeded or the patient is not
there.

> However if the number of "bad" records is small, it may be less work 
> to just key them in new, inside GNUmed.
It is hard to decide appropriate values for "small". Much
more so programmatically.

> 2. if it is desirable to be "all or none" because (for example) it 
> may be attractive to fix the records in the legacy app before 
> importing them, can the "bad" records be easily identified, say, in a 
> log of some type?
Sure. In some cases it may be difficult or impractical to
fix records in the legacy app. In such cases it may make a
lot of sense to export from the legacy app into a well known
format (say, CSV) and fix that, then import it.

> 3. or can constraints be turned off (even selectively, so that some 
> constraints still apply), in order to tolerate the impurities?
Yes.

> The 
> constraints would then be turned on for normal business in GNUmed.
Turning them back on would not better any of the data. It
might actually be impossible if the data does not actually
match the constraint.

> I 
> don't know if this would require that the tables be "ALTERed" before 
> and after or if there is a "switch" definition that would achieve 
> this during the import.
"ALTER table drop constraint" etc.

> ---> I know Karsten may cringe at this question (though I suppose it 
> is interesting technically just to know the answer). But what I am 
> thinking is that there may be records (patients) who are no longer in 
> the practice, in whose data there is not the same value 'cleaning". 
A very valid point. It would then need to be decided whether
to export only those failing-but-not-all-that-important
patients into an intermediate format which can/could be
cleaned up over time/when needed and imported. I would not
allow invalid data to be put into the database. Yes, this
may present hard problems to solve, eg what to do about an
old address for a country which does not exist anymore ? The
technical solution is to adjust the country value to the
successor country but that does not really "feel" all to
correct.

> So if for example the consequence would appear when trying to print 
> an envelope, would people rather have their office/surgery staff 
> repair the data at the time of the problem?
They'd never do so as I know from experience unless forced
by some means.

In some cases it may be prudent to have the original, legacy
data sitting in some unconstrained staging tables until the
need for using them arises at which point staff can use that
data in a controlled transfer to the real tables in which
process any necessary cleanup would have to be forced upon
them by the workflow.

> Maybe #2 is the best approach, but it is helpful to know the pros and 
> cons including feasibility of the other options. The ability to a 
> particular installation of GNUmed to accept some data impurity would 
> be no adverse reflection on the purity of design.
Well, we all know that the "can't happen" cases do happen
regardless. So, the strategy is to think hard about what
"should not happen" and turn that into a "throw error if it
does happen" and if it happens rethink the design to some
extent.

> It is not a basis 
> to "forbid" anyone to do something that is in their control. But the 
> collaborative/social consequence of deviation would be for some 
> GNUmedders to contribute help requests or problem reports that may 
> not have needed to exist. *This*  IMO would be a basis to ask people 
> to conform to desired methods.
Well, there's two approaches here:

be liberal or
be strict

I believe we should be fairly strict (and fairly good) with
respect to address data since that is needed for patient
identification and reimbursement.

We should be less strict in some areas of clinical input -
that is we MUST allow unconstrained clinical data to be
captured along with structured clinical data. Freetext
narrative fields do that for us.

Karsten
-- 
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346


_______________________________________________
Gnumed-devel mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/gnumed-devel

Re: [Gnumed-devel] catching errors in an app

Reply via email to