[GPCG_TALK] School of IT USydney -Summer HI Research Showcase 2007 - Abstracts

Jon Patrick Thu, 22 Feb 2007 20:34:25 -0800

Dear List, quite a number of people have asked me for details of thework we are presenting in our Showcase. The showcase was successful with15 attendees from SWAHS, NCCH, SESIAHS, NHMRCCTC, Pulse journalist,MIMS, Childrens-Westmead and a few private representatives.The discussions were most vigourous around the SNOMED to ICD 10 mappingand the the data mining of the ICU information systems at the RPAH andChildren's Westmead.AS a curiosity one of the students showed that in one information systemstaff opted to use free text to enter the value of a field rather thanchoose from a fixed list and managed to write "room air" in 99different ways. Moral of the story - don't let people write it their ownway.If you are going to respond to this message PLEASE remember to removeall the text of the abstracts from your outgoing response - otherwisewe'll all end up with very long subsequent messages

cheers
jon patrick
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Foundations of a Data Analytics System for an Intensive Care Unit


Jon Patrick and Glen Pink
School of Information Technologies
Faculty of Engineering
University of Sydney

David Schell, Jonothan Gillis
Pediatric Intensive Care Unit
Children's Hospital, Westmead

Data extraction and the use of data mining techniques on medical recordsoffers an extensive range of benefits to intensive care units forresearch and administration. However, success is greatly dependant onthe quality of the data collected automatically by monitors and datamanually entered by staff. Furthermore the task of retreivingappropriate data from an information system is all the harder due tovery poor interfaces for defining medical terminology and framingquestions to explain research topics. THe general problems of creatingan effective data analytics description language and execution engineneed to be grounded in understanding the state of play of existing ICUinformation systems. This paper reports on an exploration of one suchsystem to understand the foundations available for attaining acomprehensive data analytics environment for a clinician.

This study reviews a number of aspects of the data collection andstorage system at the Children's Hospital Westmead Pediatric IntensiveCare Unit (PICU), namely;

         methods and considerations for cleaning collected data,

the provision of limited reporting facilities that can be generalisedto encompass collecting and processing all of ICU data.

The initial goal was to develop a query engine for the PICU system,however a number of problems were encontered, namely:

        the structure of the backend database was internally inconsistent,
        a lack of documentation,

the need to locate various classes of medical data located in thedatabase tables manually as the database names are not standardised,

         estimations as to the contents of each table.

This insufficincey of the semantics of the data base contents resultedin an inability to automate the building of a general purpose querysystem for the database.

Data oversights and errors were discovered in the database, such as:missing values for when a patient leaves PICU; typical data entryerrors; and data entry errors due to the automation of data entry can befiltered. Defaults for entered string-values are available from thefront-end, however it is common that incorrectly spelt custom values areentered by the user (ther are 99 different examples of the term "RoomAir" ). This phenomena results in a large amount of values that must befiltered and grouped in order to collect togther euqivalent attributesso as to present meaningful results. Hence significant manual input fromthe medicall staff to provide appropriate groupings is required. Howeverautomated grouping must comprise some part as manual mapping grouping isnot feasible given the size of the database.

In coordination with medical staff, a set of algorithms was developed toremove noise from the data by allowing for certain concessions for dataentry error. Given automated data entry, incorrect data may persist forseveral hours and can only be filtered to a certain degree. Thresholdsfor different error types were determined primarily by limits forreporting requirements, however final discretion for these thresholdslays in the hands of the end user. Once noise is removed the data can beappropriately displayed in reports as required by the end-user.

Medical staff assisted in the interpretation of the data in the databaseso as to create appropriate documentation. This lead to producing asmall data analytics system which is significantly enhanced by the useof data cleaning techniques. However to obtain optimum results it isnecessary to enforce the consistency in nomenclature of entered data. Ontop of this work it will be possible to build data mining and naturallanguage processing approaches including trend analysis and forecasting,developing into a full data analytics system for ICUs.

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
A Data Analytics Environment for an ICU Information System
Jon Patrick and Victor Chan
School of Information Technologies
Faculty of Engineering
University of Sydney

Robert Herkes and Angela Ryan
Intensive Care Unit
Royal Prince Alfred Hospital

The long term objective of this research is to understand the nature ofthe analytical questions the physicians wish to have answered in theirvarious roles as clinician, or researcher, or administrator. In theinterim the task is to understand the routine way in which theinformation system (IS) is used, the data stored in it, and the workflowprocesses that revolve around it. This particular study has demonstratedtwo instances of these objectives by the generation of administratorreports, that is, correctly retrieving ventilation hours, and likewise,one sample clinical case, that is, auto-population of daily assessmentreports. The outcomes of these tests represent the first steps in along-term plan to create a generalized data analytics engine for ICUinformation systems.

At the Royal Prince Alfred Hospital, the clinical information system,CareVue has a front-end that allows staff to enter a patient’s clinicalinformation into a database. But behind this front-end is a back-endwhich uses two separate databases: the real-time database, and thearchival or historical database. While the real-time database storesdata only for the patients that are currently or recently admitted tothe ICU, the historical data warehouse stores the data of every patientthat has been admitted to the ICU since 2002.

The first stage of the project involved working on the historicaldatabase that has much more of a configuration of a data warehouse. Inorder to understand the general needs for data analytics weconcentrated on a particular question, that is the profile ofventilations of patients. For example, a question of interest is“finding the total number of invasive and non-invasive hours that aparticular patient has spent during their stay in the unit.” In order toproduce a generalized strategy for asking all similar questions, a deepunderstanding of the database architecture is required, and then anefficient method for data extraction was developed to present this datain management reports or in a user interface.

The next stage of the project involved working on the real-timedatabase. The main task was to auto-populate a pre-defined template ofdata for the General-ICU AM Ward Round with the corresponding data forany patient. The first step in this task required different SQL queriesto retrieve all the relevant data. The second step required storage ofall this data in a temporary data store, and the final step requiredauto-population of the template with the corresponding data in the datastore.

The main problems to emerge were a direct result of the lack ofdocumentation of the CareVue system. With the real-time database, eventhough there are only about 100 patients, there are over 300 tables thatare used in the database. Consequently, the main problem is identifyingwhere specific data are stored and not knowing how to link the tablestogether to extract the data of interest. With the historical database,since it stores over 11,000 patients, the main problem is retrievaltime. Identifying the location of specific data is difficult and thesolution comes from an exhaustive trial-and-error search. Only bymanually looking through each table can the meaning of data be established.

For future work, we wish to perform both real-time and historicalanalysis simultaneously. By linking both the real-time and thehistorical databases together, we will then have the ability to comparea current case with the aggregated values over many cases.

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Towards the Automatic Annotation of Medical Texts
Jon Patrick and Jessica Thallmaier
School of Information Technologies
Faculty of Engineering
University of Sydney

Kerrie McDonald
Kolling Institute of Medical Research
Royal North Shore Hospital

The aim of this project is to facilitate the creation of a system thatis able to automatically annotate clinical notes of oncologists aboutbrain tumour patients by Natural Language Processing (NLP).

The project has been sponsored by the Kolling Institute as part of theirgenetic research program. Their ultimate goal through this project is tobuild a computational model that incorporates the clinical notes of apatient and the results of microarray analysis from their tumour tissue.They hope to find relationships between certain gene expressions in thepatients and their response to specific medical treatments. Thisresearch would enable the medical community to better tailor thetreatment of individual brain tumour sufferers to improve quality oflife and increase survival time. The automatic annotation of texts intheir natural form will be an invaluable step towards this goal,allowing researchers to effectively analyse vast quantities of archiveddata.

We used the methodology of manual annotation by several knowledgeableindividuals using a set of 34 annotation tags. These tags are agreedupon by the various annotators with specific rules for their applicationto the data. While most tags apply to whole sentences an sections oftexts, we allow for tags to be within other tags. Examples of such tagsare ?chemotherapy medication? assigned within ?chemotherapy? andpathology data with its subsequent ?tumour type?, ?tumour location?,?tumour stage? tags. Annotators are required to adhere strictly to therules of each tag so as to produce the greatest level of inter-annotatoragreement.

We then employ the use of machine learning techniques to automaticallyannotate the same sample of clinical documents. We evaluate the successof the processing by the level of precision and recall we are able toachieve.

In the future we aim to expand the current annotated corpus and todevelop more accurate and precise methods to reliably re-produce theannotations automatically. We analyse the text and the manualannotations to identify possible sentence markers which attributecertain tags to the text. In future we intend to use the Text toSnomedCT (SCT) software developed by the team which will allow us to usethe medical concepts to identify and automatically annotate samples ofour text.

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
A  Generative Hospital Information Management System with Patient Tracking
William Chau and Jon Patrick
School of Information Techcnologies
Faculty of Engineering
University of Sydney

Research into a system that automatically generates local informationsystems has resulted in the production of an experimental HospitalInformation System. This system provides for the electronic mimicing ofhospital forms so that the electronic data entry for patient recordingdoes not have to be altered from the current paper based system. Such anapproach goes a long way to supporting staff in transferring to a fullelectronic record keeping system with a minimum of dislocation.

In parallel with the record management system we have developed apatient workflow system that automatically pushes the patient record andby implication the patient from one process to the next when they arefinished with the current process. A clinical department then has themetaphorical role of a train line along which the patient travels,getting off at each point at which information has to collected aboutthem and pushed back on when they have to move to the next process.Following this metaphor a hospital resembles a train network ofinterconnecting train lines, where passengers either complete theirjourney on the one line or change from one line (that is department) toanother as the processing requirements demand.

The implementation of this metaphor as a prototype Information Systemwith complete patient records has been completed, however an emergentproperty of the system of tracking the patient throughout the hospitalis simple, easy to implement, of very low cost, and only requires a tinyamount of clerical time. The gain is however significant as the CEOdashboard and that of every manager beneath him would have a networkschematic that showed where each patient is located in the network andtherby identify waiting rates, throughput rates and blockages eitherwithin a department or across the whole hospital.

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Comparison of Description Logic Reasoners for SNOMED CT
Jon Patrick and Varun Srivastrava,
School of Information Technologies
Faculty of Engineering
University of Sydney

Donna Truran, Ming Zhang
National Centre for the Classification in Health
University of Sydney

Description Logics belong to the family of logic-based knowledgerepresentation languages which can be used to characterize theterminological knowledge of an application domain like SNOMED CT. SNOMEDCT is a comprehensive set of concepts, terms and codes containing morethen 360,000 concepts, 450,000 medical descriptions and 1,200,000concept relationships. Classification of such a large terminology andestablishing its trustworthiness is a great challenge for computerscientists.


The reasoners considered in this study are:

1. CEL: LISP-based reasoner for EL. It accepts inputs in a smallextension of KRSS syntax.2. FaCT++: C++-based reasoner for SHOIQ. It accepts inputs in OWL-DL andsupports DIG API.3. RacerPro: lisp-based reasoner for SHIQ. It supports the OWL-API andthe DIG-API.

Comparison of these reasoners enforces a dispute as they use differentontological notations and to understand which reasoner performs better(accuracy) for each style of ontology is difficult. Some of theseontologies are not fully convertible to one another and comparisonbecomes difficult when the ontology used for the evaluation is verylarge. Other bases for reasoner comparisons includes factors like thetime the reasoner took to perform the classification task.

The SNOMED CT stated view (unclassified SNOMED) is maintained in theKRSS standard notation. The reasoners under study either use EL+ or OWLontology notation. The SNOMED CT stated view was converted manually tothe EL format as accepted by the CEL reasoner. FaCT++ and RacerPro usethe OWL ontology notation, so the stated view in KRSS syntax wasconverted to the OWL notation by programming a conversion routine.A number of experiments were conducted firstly to assess the differencesbetween the three systems using a number of ontologies. Forest and Medare very small.


Example         CEL (sec)       FaCT++ (sec)    RacerPro (Sec)
Forest          0.010           0.791           0.611
Med             0.010           0.440           0.311
Part-whole      0.010           0.250           0.220
Galen           9               92.764          327.01
SNOMED CT       32 min 54 sec     -               -
Time taken to process some trial ontologies of various sizes.

The output hierarchies of the reasoners are the same for Forest and Medexamples. The computed hierarchy of CEL differs from those of FaCT++ andRacerPro in the case of part-whole ontology.

Considering the time each reasoner took to classify the ontology CELappears to perform the best but it produces the hierarchy which differsfrom the other two reasoners. CEL also appears to be satisfactory forprocessing a large ontology like SNOMED. Future work will compare theoutputs of the three reasoners to understand how the generate differentuniverses of knowledge and the consequences that may have for reasoningin a health information system.

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Mapping Snomed CT to ICD 10 AM
Jon Patrick and Kenneth Chik
School of Information Technologies
Faculty of Engineering
University of Sydney

Donna Truran, Ming Zhang
National Centre for the Classification in Health
University of Sydney

The primary focus of this project is to create an automatic mappingprogram to map from SNOMED CT (SCT) to ICD-10-AM. This mapping isimportant, because it enables the automatic conversion of knowledgestored in SNOMED CT to be represented in ICD-10-AM, without needing tomanually recode it.

Moving beyond a simple lexical matching approach, our mapping algorithmuses the hierarchy of SNOMED CT to obtain more information for a moreaccurate lexical match. The appropriate amount of generalization to useas additional information is determined by a triangulation technique. Amapping from SCT to ICD-9-CM using published tables is performed andthen a reverse mapping from ICD-9-CM back to a more generalised concepthigher in the SCT ontology is perfomred via the UMLS metathesaurus maps.All the descriptions of all the concepts in the SCT ontology tree belowthis more generlised concept is then used as the text to find conceptmatches in the ICD-10-AM hierarchy. A Bayesian statistical technique isused for the lexical matching algorithm.

This approach produced both an increase in mapping accuracy and speedwhen compared to the previous attempts at the problem done at theUniversity of Sydney.

The technique developed, is not specific to SNOMED CT and ICD-10-AM. Itis flexible enough to be used to convert from SNOMED CT, or other richterminologies, ontologies, and classification systems (TOCs) other thanthe ICD-10-AM provided there is an intermediate coding scheme availablefor use.

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Data Retreival from the HIE and CARDS for Modelling Access Block Times

Peng Gao

The aim of this study was to tackle two problems. Firstly to performdata mining on the information systems from the Emergency Department andthe Cardiology Network of Westmead hospital to build a model of theinteraction between the two departments on their effects on Access Blocktimes. Second to develop methods to compute the KPIs developed in theKPI project.

The work proceeded in two stages. The first stage involved studying thedata models of the HIE database to identify the Emergancy Deparmentdata and a study of the CARDS data model to extract the data aboutcaridology patients. The second stage was to link the records betweenthe two systems.

The final step in this work is to build the computational model ofaccess block timings using the linked records. This will be completed byanother project.

begin:vcard
fn:Jon Patrick
n:Patrick;Jon
org:Faculty of Engineering;School of Information Technologies
adr;dom:;;University of Sydney
title:Chair of Language Technology
x-mozilla-html:FALSE
url:http://www.it.usyd.edu.au/~jonpat
version:2.1
end:vcard

_______________________________________________
Gpcg_talk mailing list
[email protected]
http://ozdocit.org/cgi-bin/mailman/listinfo/gpcg_talk

[GPCG_TALK] School of IT USydney -Summer HI Research Showcase 2007 - Abstracts

Reply via email to