On Fri, Feb 17, 2006 at 11:59:51PM -0800, Jim Busser wrote: > >Ha ! Do you think we should add check constraints on some > >such fields ? IOW "unlimited" length is fine but no CR/LF/FF ? > > Not sure. Some postal systems do have strict requirements that > specify, for each address "part", the line on which the part should > fall as well as the sequence within a line. So that purpose would be > served by such a constraint. We need to be very careful here not to jump to invalid conclusions. The postal system constraints you mention have nothing to do with storing the data itself. They are simply one way (and in the case of an envelope possibly the only legal way in some situations) to *present* the data.
The only conclusion I can draw from postal systems that touch upon data storage are how finely grained I will have to break apart data in the database. Suppose this contrived example: German postal system says: "ALWAYS put zip and town onto the same line like so: '04318 Leipzig'". Canadian postal system says: "Always put town onto one line and zip code onto the next line like so: 'Calgary, 132415'" >From this I can draw the conclusion that I must store zip code and town name in separate fields - even though I am a German programmer and a single field would do for my requirements. How I then put this onto an envelope is by putting things back together according to locale specific rules. The backend must assure that I *can* put things back together. It is not prudent to have things *already* put together in the backend because one postal systems requires it that way and we forgot to think of other systems. This whole business is why databases should be "normalized". The *real* question I was posing above was actually: Do people think that a field called "street" or "town" is sufficiently fine-grained for *any* postal system to be able to say: This field is *never* supposed to contain any LF/CR ? My intuition would say, there is not any country which requires us to support multi-line street names etc. However, I was asking you guys to reinforce my suspicion with your experience before I put this constraint into the database itself. > My biggest question/concern probably has to do with importing, for > example porting data over from a legacy system. When importing data > that would normally violate a constraint, would the owner have all of > the following options and are any inherent in postgres or would they > have to be especially programmed? inherent AND programmed, depending on how one goes about it > 1. when importing patients' demographics, can just the "bad" > records/lines be streamed to some kind of log file while accepting > the "good" records, or must it be "all or none"? Either or, depending on how the importer is written. My gut feeling in writing an importer for demographics would be to accept/refuse at the entire-patient level, eg import good patients, reject and log bad ones with a notice as to what the error was. > I recognize that for > whoever would have to fix the "bad" lines, it may be undesirable to > second-stage import "just" the bad lines that have been fixed. Whether it's undesirable or impractical or hard depends on the level of reject. For import of demographics it's probably best to go patient by patient. The logic would then be: either a patient is there in which case a previous demographics import fully succeeded or the patient is not there. > However if the number of "bad" records is small, it may be less work > to just key them in new, inside GNUmed. It is hard to decide appropriate values for "small". Much more so programmatically. > 2. if it is desirable to be "all or none" because (for example) it > may be attractive to fix the records in the legacy app before > importing them, can the "bad" records be easily identified, say, in a > log of some type? Sure. In some cases it may be difficult or impractical to fix records in the legacy app. In such cases it may make a lot of sense to export from the legacy app into a well known format (say, CSV) and fix that, then import it. > 3. or can constraints be turned off (even selectively, so that some > constraints still apply), in order to tolerate the impurities? Yes. > The > constraints would then be turned on for normal business in GNUmed. Turning them back on would not better any of the data. It might actually be impossible if the data does not actually match the constraint. > I > don't know if this would require that the tables be "ALTERed" before > and after or if there is a "switch" definition that would achieve > this during the import. "ALTER table drop constraint" etc. > ---> I know Karsten may cringe at this question (though I suppose it > is interesting technically just to know the answer). But what I am > thinking is that there may be records (patients) who are no longer in > the practice, in whose data there is not the same value 'cleaning". A very valid point. It would then need to be decided whether to export only those failing-but-not-all-that-important patients into an intermediate format which can/could be cleaned up over time/when needed and imported. I would not allow invalid data to be put into the database. Yes, this may present hard problems to solve, eg what to do about an old address for a country which does not exist anymore ? The technical solution is to adjust the country value to the successor country but that does not really "feel" all to correct. > So if for example the consequence would appear when trying to print > an envelope, would people rather have their office/surgery staff > repair the data at the time of the problem? They'd never do so as I know from experience unless forced by some means. In some cases it may be prudent to have the original, legacy data sitting in some unconstrained staging tables until the need for using them arises at which point staff can use that data in a controlled transfer to the real tables in which process any necessary cleanup would have to be forced upon them by the workflow. > Maybe #2 is the best approach, but it is helpful to know the pros and > cons including feasibility of the other options. The ability to a > particular installation of GNUmed to accept some data impurity would > be no adverse reflection on the purity of design. Well, we all know that the "can't happen" cases do happen regardless. So, the strategy is to think hard about what "should not happen" and turn that into a "throw error if it does happen" and if it happens rethink the design to some extent. > It is not a basis > to "forbid" anyone to do something that is in their control. But the > collaborative/social consequence of deviation would be for some > GNUmedders to contribute help requests or problem reports that may > not have needed to exist. *This* IMO would be a basis to ask people > to conform to desired methods. Well, there's two approaches here: be liberal or be strict I believe we should be fairly strict (and fairly good) with respect to address data since that is needed for patient identification and reimbursement. We should be less strict in some areas of clinical input - that is we MUST allow unconstrained clinical data to be captured along with structured clinical data. Freetext narrative fields do that for us. Karsten -- GPG key ID E4071346 @ wwwkeys.pgp.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346 _______________________________________________ Gnumed-devel mailing list [email protected] http://lists.gnu.org/mailman/listinfo/gnumed-devel
