Dear List, quite a number of people have asked me for details of the
work we are presenting in our Showcase. The showcase was successful with
15 attendees from SWAHS, NCCH, SESIAHS, NHMRCCTC, Pulse journalist,
MIMS, Childrens-Westmead and a few private representatives.
The discussions were most vigourous around the SNOMED to ICD 10 mapping
and the the data mining of the ICU information systems at the RPAH and
Children's Westmead.
AS a curiosity one of the students showed that in one information system
staff opted to use free text to enter the value of a field rather than
choose from a fixed list and managed to write "room air" in 99
different ways. Moral of the story - don't let people write it their own
way.
If you are going to respond to this message PLEASE remember to remove
all the text of the abstracts from your outgoing response - otherwise
we'll all end up with very long subsequent messages
cheers
jon patrick
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Foundations of a Data Analytics System for an Intensive Care Unit
Jon Patrick and Glen Pink
School of Information Technologies
Faculty of Engineering
University of Sydney
David Schell, Jonothan Gillis
Pediatric Intensive Care Unit
Children's Hospital, Westmead
Data extraction and the use of data mining techniques on medical records
offers an extensive range of benefits to intensive care units for
research and administration. However, success is greatly dependant on
the quality of the data collected automatically by monitors and data
manually entered by staff. Furthermore the task of retreiving
appropriate data from an information system is all the harder due to
very poor interfaces for defining medical terminology and framing
questions to explain research topics. THe general problems of creating
an effective data analytics description language and execution engine
need to be grounded in understanding the state of play of existing ICU
information systems. This paper reports on an exploration of one such
system to understand the foundations available for attaining a
comprehensive data analytics environment for a clinician.
This study reviews a number of aspects of the data collection and
storage system at the Children's Hospital Westmead Pediatric Intensive
Care Unit (PICU), namely;
methods and considerations for cleaning collected data,
the provision of limited reporting facilities that can be generalised
to encompass collecting and processing all of ICU data.
The initial goal was to develop a query engine for the PICU system,
however a number of problems were encontered, namely:
the structure of the backend database was internally inconsistent,
a lack of documentation,
the need to locate various classes of medical data located in the
database tables manually as the database names are not standardised,
estimations as to the contents of each table.
This insufficincey of the semantics of the data base contents resulted
in an inability to automate the building of a general purpose query
system for the database.
Data oversights and errors were discovered in the database, such as:
missing values for when a patient leaves PICU; typical data entry
errors; and data entry errors due to the automation of data entry can be
filtered. Defaults for entered string-values are available from the
front-end, however it is common that incorrectly spelt custom values are
entered by the user (ther are 99 different examples of the term "Room
Air" ). This phenomena results in a large amount of values that must be
filtered and grouped in order to collect togther euqivalent attributes
so as to present meaningful results. Hence significant manual input from
the medicall staff to provide appropriate groupings is required. However
automated grouping must comprise some part as manual mapping grouping is
not feasible given the size of the database.
In coordination with medical staff, a set of algorithms was developed to
remove noise from the data by allowing for certain concessions for data
entry error. Given automated data entry, incorrect data may persist for
several hours and can only be filtered to a certain degree. Thresholds
for different error types were determined primarily by limits for
reporting requirements, however final discretion for these thresholds
lays in the hands of the end user. Once noise is removed the data can be
appropriately displayed in reports as required by the end-user.
Medical staff assisted in the interpretation of the data in the database
so as to create appropriate documentation. This lead to producing a
small data analytics system which is significantly enhanced by the use
of data cleaning techniques. However to obtain optimum results it is
necessary to enforce the consistency in nomenclature of entered data. On
top of this work it will be possible to build data mining and natural
language processing approaches including trend analysis and forecasting,
developing into a full data analytics system for ICUs.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
A Data Analytics Environment for an ICU Information System
Jon Patrick and Victor Chan
School of Information Technologies
Faculty of Engineering
University of Sydney
Robert Herkes and Angela Ryan
Intensive Care Unit
Royal Prince Alfred Hospital
The long term objective of this research is to understand the nature of
the analytical questions the physicians wish to have answered in their
various roles as clinician, or researcher, or administrator. In the
interim the task is to understand the routine way in which the
information system (IS) is used, the data stored in it, and the workflow
processes that revolve around it. This particular study has demonstrated
two instances of these objectives by the generation of administrator
reports, that is, correctly retrieving ventilation hours, and likewise,
one sample clinical case, that is, auto-population of daily assessment
reports. The outcomes of these tests represent the first steps in a
long-term plan to create a generalized data analytics engine for ICU
information systems.
At the Royal Prince Alfred Hospital, the clinical information system,
CareVue has a front-end that allows staff to enter a patient’s clinical
information into a database. But behind this front-end is a back-end
which uses two separate databases: the real-time database, and the
archival or historical database. While the real-time database stores
data only for the patients that are currently or recently admitted to
the ICU, the historical data warehouse stores the data of every patient
that has been admitted to the ICU since 2002.
The first stage of the project involved working on the historical
database that has much more of a configuration of a data warehouse. In
order to understand the general needs for data analytics we
concentrated on a particular question, that is the profile of
ventilations of patients. For example, a question of interest is
“finding the total number of invasive and non-invasive hours that a
particular patient has spent during their stay in the unit.” In order to
produce a generalized strategy for asking all similar questions, a deep
understanding of the database architecture is required, and then an
efficient method for data extraction was developed to present this data
in management reports or in a user interface.
The next stage of the project involved working on the real-time
database. The main task was to auto-populate a pre-defined template of
data for the General-ICU AM Ward Round with the corresponding data for
any patient. The first step in this task required different SQL queries
to retrieve all the relevant data. The second step required storage of
all this data in a temporary data store, and the final step required
auto-population of the template with the corresponding data in the data
store.
The main problems to emerge were a direct result of the lack of
documentation of the CareVue system. With the real-time database, even
though there are only about 100 patients, there are over 300 tables that
are used in the database. Consequently, the main problem is identifying
where specific data are stored and not knowing how to link the tables
together to extract the data of interest. With the historical database,
since it stores over 11,000 patients, the main problem is retrieval
time. Identifying the location of specific data is difficult and the
solution comes from an exhaustive trial-and-error search. Only by
manually looking through each table can the meaning of data be established.
For future work, we wish to perform both real-time and historical
analysis simultaneously. By linking both the real-time and the
historical databases together, we will then have the ability to compare
a current case with the aggregated values over many cases.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Towards the Automatic Annotation of Medical Texts
Jon Patrick and Jessica Thallmaier
School of Information Technologies
Faculty of Engineering
University of Sydney
Kerrie McDonald
Kolling Institute of Medical Research
Royal North Shore Hospital
The aim of this project is to facilitate the creation of a system that
is able to automatically annotate clinical notes of oncologists about
brain tumour patients by Natural Language Processing (NLP).
The project has been sponsored by the Kolling Institute as part of their
genetic research program. Their ultimate goal through this project is to
build a computational model that incorporates the clinical notes of a
patient and the results of microarray analysis from their tumour tissue.
They hope to find relationships between certain gene expressions in the
patients and their response to specific medical treatments. This
research would enable the medical community to better tailor the
treatment of individual brain tumour sufferers to improve quality of
life and increase survival time. The automatic annotation of texts in
their natural form will be an invaluable step towards this goal,
allowing researchers to effectively analyse vast quantities of archived
data.
We used the methodology of manual annotation by several knowledgeable
individuals using a set of 34 annotation tags. These tags are agreed
upon by the various annotators with specific rules for their application
to the data. While most tags apply to whole sentences an sections of
texts, we allow for tags to be within other tags. Examples of such tags
are ?chemotherapy medication? assigned within ?chemotherapy? and
pathology data with its subsequent ?tumour type?, ?tumour location?,
?tumour stage? tags. Annotators are required to adhere strictly to the
rules of each tag so as to produce the greatest level of inter-annotator
agreement.
We then employ the use of machine learning techniques to automatically
annotate the same sample of clinical documents. We evaluate the success
of the processing by the level of precision and recall we are able to
achieve.
In the future we aim to expand the current annotated corpus and to
develop more accurate and precise methods to reliably re-produce the
annotations automatically. We analyse the text and the manual
annotations to identify possible sentence markers which attribute
certain tags to the text. In future we intend to use the Text to
SnomedCT (SCT) software developed by the team which will allow us to use
the medical concepts to identify and automatically annotate samples of
our text.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
A Generative Hospital Information Management System with Patient Tracking
William Chau and Jon Patrick
School of Information Techcnologies
Faculty of Engineering
University of Sydney
Research into a system that automatically generates local information
systems has resulted in the production of an experimental Hospital
Information System. This system provides for the electronic mimicing of
hospital forms so that the electronic data entry for patient recording
does not have to be altered from the current paper based system. Such an
approach goes a long way to supporting staff in transferring to a full
electronic record keeping system with a minimum of dislocation.
In parallel with the record management system we have developed a
patient workflow system that automatically pushes the patient record and
by implication the patient from one process to the next when they are
finished with the current process. A clinical department then has the
metaphorical role of a train line along which the patient travels,
getting off at each point at which information has to collected about
them and pushed back on when they have to move to the next process.
Following this metaphor a hospital resembles a train network of
interconnecting train lines, where passengers either complete their
journey on the one line or change from one line (that is department) to
another as the processing requirements demand.
The implementation of this metaphor as a prototype Information System
with complete patient records has been completed, however an emergent
property of the system of tracking the patient throughout the hospital
is simple, easy to implement, of very low cost, and only requires a tiny
amount of clerical time. The gain is however significant as the CEO
dashboard and that of every manager beneath him would have a network
schematic that showed where each patient is located in the network and
therby identify waiting rates, throughput rates and blockages either
within a department or across the whole hospital.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Comparison of Description Logic Reasoners for SNOMED CT
Jon Patrick and Varun Srivastrava,
School of Information Technologies
Faculty of Engineering
University of Sydney
Donna Truran, Ming Zhang
National Centre for the Classification in Health
University of Sydney
Description Logics belong to the family of logic-based knowledge
representation languages which can be used to characterize the
terminological knowledge of an application domain like SNOMED CT. SNOMED
CT is a comprehensive set of concepts, terms and codes containing more
then 360,000 concepts, 450,000 medical descriptions and 1,200,000
concept relationships. Classification of such a large terminology and
establishing its trustworthiness is a great challenge for computer
scientists.
The reasoners considered in this study are:
1. CEL: LISP-based reasoner for EL. It accepts inputs in a small
extension of KRSS syntax.
2. FaCT++: C++-based reasoner for SHOIQ. It accepts inputs in OWL-DL and
supports DIG API.
3. RacerPro: lisp-based reasoner for SHIQ. It supports the OWL-API and
the DIG-API.
Comparison of these reasoners enforces a dispute as they use different
ontological notations and to understand which reasoner performs better
(accuracy) for each style of ontology is difficult. Some of these
ontologies are not fully convertible to one another and comparison
becomes difficult when the ontology used for the evaluation is very
large. Other bases for reasoner comparisons includes factors like the
time the reasoner took to perform the classification task.
The SNOMED CT stated view (unclassified SNOMED) is maintained in the
KRSS standard notation. The reasoners under study either use EL+ or OWL
ontology notation. The SNOMED CT stated view was converted manually to
the EL format as accepted by the CEL reasoner. FaCT++ and RacerPro use
the OWL ontology notation, so the stated view in KRSS syntax was
converted to the OWL notation by programming a conversion routine.
A number of experiments were conducted firstly to assess the differences
between the three systems using a number of ontologies. Forest and Med
are very small.
Example CEL (sec) FaCT++ (sec) RacerPro (Sec)
Forest 0.010 0.791 0.611
Med 0.010 0.440 0.311
Part-whole 0.010 0.250 0.220
Galen 9 92.764 327.01
SNOMED CT 32 min 54 sec - -
Time taken to process some trial ontologies of various sizes.
The output hierarchies of the reasoners are the same for Forest and Med
examples. The computed hierarchy of CEL differs from those of FaCT++ and
RacerPro in the case of part-whole ontology.
Considering the time each reasoner took to classify the ontology CEL
appears to perform the best but it produces the hierarchy which differs
from the other two reasoners. CEL also appears to be satisfactory for
processing a large ontology like SNOMED. Future work will compare the
outputs of the three reasoners to understand how the generate different
universes of knowledge and the consequences that may have for reasoning
in a health information system.
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Mapping Snomed CT to ICD 10 AM
Jon Patrick and Kenneth Chik
School of Information Technologies
Faculty of Engineering
University of Sydney
Donna Truran, Ming Zhang
National Centre for the Classification in Health
University of Sydney
The primary focus of this project is to create an automatic mapping
program to map from SNOMED CT (SCT) to ICD-10-AM. This mapping is
important, because it enables the automatic conversion of knowledge
stored in SNOMED CT to be represented in ICD-10-AM, without needing to
manually recode it.
Moving beyond a simple lexical matching approach, our mapping algorithm
uses the hierarchy of SNOMED CT to obtain more information for a more
accurate lexical match. The appropriate amount of generalization to use
as additional information is determined by a triangulation technique. A
mapping from SCT to ICD-9-CM using published tables is performed and
then a reverse mapping from ICD-9-CM back to a more generalised concept
higher in the SCT ontology is perfomred via the UMLS metathesaurus maps.
All the descriptions of all the concepts in the SCT ontology tree below
this more generlised concept is then used as the text to find concept
matches in the ICD-10-AM hierarchy. A Bayesian statistical technique is
used for the lexical matching algorithm.
This approach produced both an increase in mapping accuracy and speed
when compared to the previous attempts at the problem done at the
University of Sydney.
The technique developed, is not specific to SNOMED CT and ICD-10-AM. It
is flexible enough to be used to convert from SNOMED CT, or other rich
terminologies, ontologies, and classification systems (TOCs) other than
the ICD-10-AM provided there is an intermediate coding scheme available
for use.
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Data Retreival from the HIE and CARDS for Modelling Access Block Times
Peng Gao
The aim of this study was to tackle two problems. Firstly to perform
data mining on the information systems from the Emergency Department and
the Cardiology Network of Westmead hospital to build a model of the
interaction between the two departments on their effects on Access Block
times. Second to develop methods to compute the KPIs developed in the
KPI project.
The work proceeded in two stages. The first stage involved studying the
data models of the HIE database to identify the Emergancy Deparment
data and a study of the CARDS data model to extract the data about
caridology patients. The second stage was to link the records between
the two systems.
The final step in this work is to build the computational model of
access block timings using the linked records. This will be completed by
another project.
begin:vcard
fn:Jon Patrick
n:Patrick;Jon
org:Faculty of Engineering;School of Information Technologies
adr;dom:;;University of Sydney
title:Chair of Language Technology
x-mozilla-html:FALSE
url:http://www.it.usyd.edu.au/~jonpat
version:2.1
end:vcard
_______________________________________________
Gpcg_talk mailing list
[email protected]
http://ozdocit.org/cgi-bin/mailman/listinfo/gpcg_talk