David wrote: But the problem with MHRecord lies in it's unknown objectives
Please explain what you imagine these "unknown objectives" might be in concrete language and how they might hurt me. It sound very like fairies at the bottom on the garden talk. Sorry, goblins. > Longitudinal studies have to be reasonably well-controlled to be reliable, > and a collection of random PDFs is unlikely to cut it. Longitudinal studies are not actually controlled studies, they're different. I'm not 100% sure what sense you are using the word "random" here unless it is just a generalized pejorative. The data in MyHR is not complete. However, completeness is rare in experimental data sets in medical science, and in science generally. A slew of statistical methods has been developed to deal with incomplete data sets. Google and Facebook have been incredibly economically successful working with incomplete data sets, however, their primary objective is to sell stuff, not improve population health. Big data has been incredibly successful in lots of areas and there is no good reason to think it won't work in health science - or health economics. As a matter of fact, big data is already being used successfully in health, go look. PDFs also appears to be a pejorative term here. Just so you or anyone still tuned in knows, I'll explain it: The basic reasons why PDFs were used is that it is the existing system. Doctors look at text records. It is what thousands of bits of healthcare software in hospitals and labs produce. It's the format that gets checked and approved. Ideally, from an abstract data perspective at least, health records would use some kind of structured xml-like format, clearly and unambiguously. There are two primary problems; the scale of change on the source side, and, creating the data standards. There is no unified common standard for naming medical symptoms or diagnoses. Names change from place to place. Standardisation requires doctors to change the names of their diagnoses. Similarly, medical testing is done differently from place to place using different standards and different equipment. It is often annotated to indicate problems with a sample or an interpretation. The process has multiple checks to ensure reliability, culminating in check and sign off of the final text by a senior clinician. The clinician does not sign off an xml data set and they would be rightfully wary of having their signoff to an xml dataset. There are ongoing moves to standardisation and abstraction of data from presentation but these are slow and careful processes that will take years. We are stuck with PDFs for some time. Do PDFs present a problem for researchers? Yes. Do they think they can handle it? Yes. If Google can reliably determine street numbers in all kinds of formats from photos, extracting a particular data element from a PDF blood test will be relatively easy. The data doesn't have to be perfect; real world datasets are not perfect. What researchers are excited by is the numbers. Rather than running an expensive longitudinal study or RCT over a few hundred participants that struggles to achieve statistical significance they are looking at the n=100 000 or 5 000 000 real world trials. The data is of course different, weaker in many respects but stronger in others. Meshing epidemiological studies with trails is normal in medical science but we can expect to see more good epidemiological studies. Epidemiological studies are highly regarded in medical science for very good reasons that I won't go into but you can check this if you are interested. Jim _______________________________________________ Link mailing list [email protected] http://mailman.anu.edu.au/mailman/listinfo/link
