Dear List, quite a number of people have asked me for details of the work we are presenting in our Showcase. The showcase was successful with 15 attendees from SWAHS, NCCH, SESIAHS, NHMRCCTC, Pulse journalist, MIMS, Childrens-Westmead and a few private representatives. The discussions were most vigourous around the SNOMED to ICD 10 mapping and the the data mining of the ICU information systems at the RPAH and Children's Westmead. AS a curiosity one of the students showed that in one information system staff opted to use free text to enter the value of a field rather than choose from a fixed list and managed to write "room air" in 99 different ways. Moral of the story - don't let people write it their own way. If you are going to respond to this message PLEASE remember to remove all the text of the abstracts from your outgoing response - otherwise we'll all end up with very long subsequent messages
cheers
jon patrick
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Foundations of a Data Analytics System for an Intensive Care Unit

Jon Patrick and Glen Pink
School of Information Technologies
Faculty of Engineering
University of Sydney

David Schell, Jonothan Gillis
Pediatric Intensive Care Unit
Children's Hospital, Westmead

Data extraction and the use of data mining techniques on medical records offers an extensive range of benefits to intensive care units for research and administration. However, success is greatly dependant on the quality of the data collected automatically by monitors and data manually entered by staff. Furthermore the task of retreiving appropriate data from an information system is all the harder due to very poor interfaces for defining medical terminology and framing questions to explain research topics. THe general problems of creating an effective data analytics description language and execution engine need to be grounded in understanding the state of play of existing ICU information systems. This paper reports on an exploration of one such system to understand the foundations available for attaining a comprehensive data analytics environment for a clinician.

This study reviews a number of aspects of the data collection and storage system at the Children's Hospital Westmead Pediatric Intensive Care Unit (PICU), namely;
         methods and considerations for cleaning collected data,
the provision of limited reporting facilities that can be generalised to encompass collecting and processing all of ICU data.

The initial goal was to develop a query engine for the PICU system, however a number of problems were encontered, namely:
        the structure of the backend database was internally inconsistent,
        a lack of documentation,
the need to locate various classes of medical data located in the database tables manually as the database names are not standardised,
         estimations as to the contents of each table.
This insufficincey of the semantics of the data base contents resulted in an inability to automate the building of a general purpose query system for the database.

Data oversights and errors were discovered in the database, such as: missing values for when a patient leaves PICU; typical data entry errors; and data entry errors due to the automation of data entry can be filtered. Defaults for entered string-values are available from the front-end, however it is common that incorrectly spelt custom values are entered by the user (ther are 99 different examples of the term "Room Air" ). This phenomena results in a large amount of values that must be filtered and grouped in order to collect togther euqivalent attributes so as to present meaningful results. Hence significant manual input from the medicall staff to provide appropriate groupings is required. However automated grouping must comprise some part as manual mapping grouping is not feasible given the size of the database.

In coordination with medical staff, a set of algorithms was developed to remove noise from the data by allowing for certain concessions for data entry error. Given automated data entry, incorrect data may persist for several hours and can only be filtered to a certain degree. Thresholds for different error types were determined primarily by limits for reporting requirements, however final discretion for these thresholds lays in the hands of the end user. Once noise is removed the data can be appropriately displayed in reports as required by the end-user.

Medical staff assisted in the interpretation of the data in the database so as to create appropriate documentation. This lead to producing a small data analytics system which is significantly enhanced by the use of data cleaning techniques. However to obtain optimum results it is necessary to enforce the consistency in nomenclature of entered data. On top of this work it will be possible to build data mining and natural language processing approaches including trend analysis and forecasting, developing into a full data analytics system for ICUs.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
A Data Analytics Environment for an ICU Information System
Jon Patrick and Victor Chan
School of Information Technologies
Faculty of Engineering
University of Sydney

Robert Herkes and Angela Ryan
Intensive Care Unit
Royal Prince Alfred Hospital

The long term objective of this research is to understand the nature of the analytical questions the physicians wish to have answered in their various roles as clinician, or researcher, or administrator. In the interim the task is to understand the routine way in which the information system (IS) is used, the data stored in it, and the workflow processes that revolve around it. This particular study has demonstrated two instances of these objectives by the generation of administrator reports, that is, correctly retrieving ventilation hours, and likewise, one sample clinical case, that is, auto-population of daily assessment reports. The outcomes of these tests represent the first steps in a long-term plan to create a generalized data analytics engine for ICU information systems.

At the Royal Prince Alfred Hospital, the clinical information system, CareVue has a front-end that allows staff to enter a patient’s clinical information into a database. But behind this front-end is a back-end which uses two separate databases: the real-time database, and the archival or historical database. While the real-time database stores data only for the patients that are currently or recently admitted to the ICU, the historical data warehouse stores the data of every patient that has been admitted to the ICU since 2002.
        
The first stage of the project involved working on the historical database that has much more of a configuration of a data warehouse. In order to understand the general needs for data analytics we concentrated on a particular question, that is the profile of ventilations of patients. For example, a question of interest is “finding the total number of invasive and non-invasive hours that a particular patient has spent during their stay in the unit.” In order to produce a generalized strategy for asking all similar questions, a deep understanding of the database architecture is required, and then an efficient method for data extraction was developed to present this data in management reports or in a user interface.
        
The next stage of the project involved working on the real-time database. The main task was to auto-populate a pre-defined template of data for the General-ICU AM Ward Round with the corresponding data for any patient. The first step in this task required different SQL queries to retrieve all the relevant data. The second step required storage of all this data in a temporary data store, and the final step required auto-population of the template with the corresponding data in the data store.
        
The main problems to emerge were a direct result of the lack of documentation of the CareVue system. With the real-time database, even though there are only about 100 patients, there are over 300 tables that are used in the database. Consequently, the main problem is identifying where specific data are stored and not knowing how to link the tables together to extract the data of interest. With the historical database, since it stores over 11,000 patients, the main problem is retrieval time. Identifying the location of specific data is difficult and the solution comes from an exhaustive trial-and-error search. Only by manually looking through each table can the meaning of data be established.

For future work, we wish to perform both real-time and historical analysis simultaneously. By linking both the real-time and the historical databases together, we will then have the ability to compare a current case with the aggregated values over many cases.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Towards the Automatic Annotation of Medical Texts
Jon Patrick and Jessica Thallmaier
School of Information Technologies
Faculty of Engineering
University of Sydney

Kerrie McDonald
Kolling Institute of Medical Research
Royal North Shore Hospital

The aim of this project is to facilitate the creation of a system that is able to automatically annotate clinical notes of oncologists about brain tumour patients by Natural Language Processing (NLP).

The project has been sponsored by the Kolling Institute as part of their genetic research program. Their ultimate goal through this project is to build a computational model that incorporates the clinical notes of a patient and the results of microarray analysis from their tumour tissue. They hope to find relationships between certain gene expressions in the patients and their response to specific medical treatments. This research would enable the medical community to better tailor the treatment of individual brain tumour sufferers to improve quality of life and increase survival time. The automatic annotation of texts in their natural form will be an invaluable step towards this goal, allowing researchers to effectively analyse vast quantities of archived data.

We used the methodology of manual annotation by several knowledgeable individuals using a set of 34 annotation tags. These tags are agreed upon by the various annotators with specific rules for their application to the data. While most tags apply to whole sentences an sections of texts, we allow for tags to be within other tags. Examples of such tags are ?chemotherapy medication? assigned within ?chemotherapy? and pathology data with its subsequent ?tumour type?, ?tumour location?, ?tumour stage? tags. Annotators are required to adhere strictly to the rules of each tag so as to produce the greatest level of inter-annotator agreement.

We then employ the use of machine learning techniques to automatically annotate the same sample of clinical documents. We evaluate the success of the processing by the level of precision and recall we are able to achieve.

In the future we aim to expand the current annotated corpus and to develop more accurate and precise methods to reliably re-produce the annotations automatically. We analyse the text and the manual annotations to identify possible sentence markers which attribute certain tags to the text. In future we intend to use the Text to SnomedCT (SCT) software developed by the team which will allow us to use the medical concepts to identify and automatically annotate samples of our text.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
A  Generative Hospital Information Management System with Patient Tracking
William Chau and Jon Patrick
School of Information Techcnologies
Faculty of Engineering
University of Sydney

Research into a system that automatically generates local information systems has resulted in the production of an experimental Hospital Information System. This system provides for the electronic mimicing of hospital forms so that the electronic data entry for patient recording does not have to be altered from the current paper based system. Such an approach goes a long way to supporting staff in transferring to a full electronic record keeping system with a minimum of dislocation.

In parallel with the record management system we have developed a patient workflow system that automatically pushes the patient record and by implication the patient from one process to the next when they are finished with the current process. A clinical department then has the metaphorical role of a train line along which the patient travels, getting off at each point at which information has to collected about them and pushed back on when they have to move to the next process. Following this metaphor a hospital resembles a train network of interconnecting train lines, where passengers either complete their journey on the one line or change from one line (that is department) to another as the processing requirements demand.

The implementation of this metaphor as a prototype Information System with complete patient records has been completed, however an emergent property of the system of tracking the patient throughout the hospital is simple, easy to implement, of very low cost, and only requires a tiny amount of clerical time. The gain is however significant as the CEO dashboard and that of every manager beneath him would have a network schematic that showed where each patient is located in the network and therby identify waiting rates, throughput rates and blockages either within a department or across the whole hospital.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Comparison of Description Logic Reasoners for SNOMED CT
Jon Patrick and Varun Srivastrava,
School of Information Technologies
Faculty of Engineering
University of Sydney

Donna Truran, Ming Zhang
National Centre for the Classification in Health
University of Sydney

Description Logics belong to the family of logic-based knowledge representation languages which can be used to characterize the terminological knowledge of an application domain like SNOMED CT. SNOMED CT is a comprehensive set of concepts, terms and codes containing more then 360,000 concepts, 450,000 medical descriptions and 1,200,000 concept relationships. Classification of such a large terminology and establishing its trustworthiness is a great challenge for computer scientists.

The reasoners considered in this study are:

1. CEL: LISP-based reasoner for EL. It accepts inputs in a small extension of KRSS syntax. 2. FaCT++: C++-based reasoner for SHOIQ. It accepts inputs in OWL-DL and supports DIG API. 3. RacerPro: lisp-based reasoner for SHIQ. It supports the OWL-API and the DIG-API.

Comparison of these reasoners enforces a dispute as they use different ontological notations and to understand which reasoner performs better (accuracy) for each style of ontology is difficult. Some of these ontologies are not fully convertible to one another and comparison becomes difficult when the ontology used for the evaluation is very large. Other bases for reasoner comparisons includes factors like the time the reasoner took to perform the classification task.

The SNOMED CT stated view (unclassified SNOMED) is maintained in the KRSS standard notation. The reasoners under study either use EL+ or OWL ontology notation. The SNOMED CT stated view was converted manually to the EL format as accepted by the CEL reasoner. FaCT++ and RacerPro use the OWL ontology notation, so the stated view in KRSS syntax was converted to the OWL notation by programming a conversion routine. A number of experiments were conducted firstly to assess the differences between the three systems using a number of ontologies. Forest and Med are very small.

Example         CEL (sec)       FaCT++ (sec)    RacerPro (Sec)
Forest          0.010           0.791           0.611
Med             0.010           0.440           0.311
Part-whole      0.010           0.250           0.220
Galen           9               92.764          327.01
SNOMED CT       32 min 54 sec     -               -
Time taken to process some trial ontologies of various sizes.

The output hierarchies of the reasoners are the same for Forest and Med examples. The computed hierarchy of CEL differs from those of FaCT++ and RacerPro in the case of part-whole ontology.

Considering the time each reasoner took to classify the ontology CEL appears to perform the best but it produces the hierarchy which differs from the other two reasoners. CEL also appears to be satisfactory for processing a large ontology like SNOMED. Future work will compare the outputs of the three reasoners to understand how the generate different universes of knowledge and the consequences that may have for reasoning in a health information system.
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Mapping Snomed CT to ICD 10 AM
Jon Patrick and Kenneth Chik
School of Information Technologies
Faculty of Engineering
University of Sydney

Donna Truran, Ming Zhang
National Centre for the Classification in Health
University of Sydney


The primary focus of this project is to create an automatic mapping program to map from SNOMED CT (SCT) to ICD-10-AM. This mapping is important, because it enables the automatic conversion of knowledge stored in SNOMED CT to be represented in ICD-10-AM, without needing to manually recode it.

Moving beyond a simple lexical matching approach, our mapping algorithm uses the hierarchy of SNOMED CT to obtain more information for a more accurate lexical match. The appropriate amount of generalization to use as additional information is determined by a triangulation technique. A mapping from SCT to ICD-9-CM using published tables is performed and then a reverse mapping from ICD-9-CM back to a more generalised concept higher in the SCT ontology is perfomred via the UMLS metathesaurus maps. All the descriptions of all the concepts in the SCT ontology tree below this more generlised concept is then used as the text to find concept matches in the ICD-10-AM hierarchy. A Bayesian statistical technique is used for the lexical matching algorithm.

This approach produced both an increase in mapping accuracy and speed when compared to the previous attempts at the problem done at the University of Sydney.

The technique developed, is not specific to SNOMED CT and ICD-10-AM. It is flexible enough to be used to convert from SNOMED CT, or other rich terminologies, ontologies, and classification systems (TOCs) other than the ICD-10-AM provided there is an intermediate coding scheme available for use.
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Data Retreival from the HIE and CARDS for Modelling Access Block Times

Peng Gao

The aim of this study was to tackle two problems. Firstly to perform data mining on the information systems from the Emergency Department and the Cardiology Network of Westmead hospital to build a model of the interaction between the two departments on their effects on Access Block times. Second to develop methods to compute the KPIs developed in the KPI project.

The work proceeded in two stages. The first stage involved studying the data models of the HIE database to identify the Emergancy Deparment data and a study of the CARDS data model to extract the data about caridology patients. The second stage was to link the records between the two systems.

The final step in this work is to build the computational model of access block timings using the linked records. This will be completed by another project.
begin:vcard
fn:Jon Patrick
n:Patrick;Jon
org:Faculty of Engineering;School of Information Technologies
adr;dom:;;University of Sydney
title:Chair of Language Technology
x-mozilla-html:FALSE
url:http://www.it.usyd.edu.au/~jonpat
version:2.1
end:vcard

_______________________________________________
Gpcg_talk mailing list
[email protected]
http://ozdocit.org/cgi-bin/mailman/listinfo/gpcg_talk

Reply via email to