Re: [OPENMRS-DEV] Latest improvements to the Patient Matching module

Shaun Grannis Thu, 01 Mar 2012 22:23:01 -0800

Great discussion. I agree wholeheartedly with the notions of CustomColumn (or 
maybe call it "DerivedColumn" or "DerivedTrait"?), MatchSimilarityAlgorithm 
(although I might call this ColumnSimilarityAlgorithm, FieldSimilarityAlgorithm 
or TraitSimilarityAlgorithm -- 'traits' being those data elements that describe 
an individual, like name, date of birth, gender, geographic location, etc.)


One thought I'd contribute regarding phonetic transformation: the design 
decision point may not be ease of development (e.g.,  javascript versus java 
versus groovy -- I have no particular preference), but raw performance. 
Depending on the phonetic transformation algorithm -- and as you all likely 
know there are a few: Soundex, NYSIIS, Metaphone, double-Metaphone, each with 
their advantages and disadvantages among various ethnic groups -- the 
computational cost to create that transformation may be appreciable, 
potentially creating unacceptable execution times for the the matching process.

We've intermittently debated the merits of on-the-fly transformation versus 
creating these derived fields when committing the original values. So I would 
ask the community: Should  "derived traits" or "custom columns" -- such as 
phonetic transformations -- be created at commit time? Or created on-the-fly at 
match time? Would one possible approach to answering this question be to 
conduct performance testing for each particular transformation or derivation 
using real-world data?

Another important question–research anyone? ;) –that may or may not have been 
answered, particularly for African names: Which phonetic transformations work 
best (e.g., produce the maximum matching accuracy) for a particular ethnic 
group?

Best,

- Shaun


On Feb 28, 2012, at 6:08 PM, Dave Thomas wrote:

> Thanks Burke.
> 
> Actually, the phonetics encoders would be great candidates for scripting, 
> because ultimately, they're just a set of regex-driven string substitutions...
> 
> d
> 
> On Mon, Feb 27, 2012 at 6:35 PM, Burke Mamlin <[email protected]> wrote:
> Dave,
> 
> Our immediate goals at AMPATH are to handle multiple identifiers and possibly 
> do distance comparisons of GPS coordinates.  Your's is another good example 
> of another handy matching algorithm.  The goal of creating an interface for 
> the algorithm is to allow for developers to create any algorithm that meets 
> their needs.  So, instead of trying to solve every problem in the patient 
> matching module, we'll make it easy for devs add customizations.  I've spoken 
> with Shaun Grannis and he likes the idea.
> 
> While we'll start by introducing a custom algorithms, I'd also like to do the 
> same on the front end – i.e., allow for a custom column (per Shaun, this 
> should be called a "custom trait"), so devs could easily throw classes on the 
> classpath to expand on the list of available traits (columns) as well as the 
> list of possible methods of matching data.  This would open the door for 
> popular customizations to eventually be packaged with the module.  And, I'd 
> love to see a set of customizations use the scripting interface built into 
> Java (JSR-223) to create custom traits & algorithms that could bind the 
> patient and API to a scripting context and execute user-written scripts for 
> truly customizable matching (of course it would need to be efficient, but 
> javax.script defines a CompiledScript… so maybe it's not just a dream).  
> Imagine being able to write your namephonetics algorithm in Groovy or 
> JavaScript and use it within the patient matching module. ;-)
> 
> Small steps...
> 
> -Burke
> 
> 
> On Mon, Feb 27, 2012 at 5:16 PM, Dave Thomas <[email protected]> wrote:
> Hi.  I haven't followed this discussion very closely but if you're creating 
> interfaces in the patient matching module, i just wanted to register the 
> use-case that it would be really cool if there were interfaces exposed in 
> patient matching that would allow namephonetics to register a phonetics 
> equivalence test as part of the matching criteria.  
> 
> (meaning finding names that are phonetically the same, even if spelled 
> differently, according to the phonetics structure of the local language)
> 
> d
> 
> 
> On Mon, Feb 27, 2012 at 10:33 AM, Burke Mamlin <[email protected]> 
> wrote:
> All we need (at least to get started) is for the algorithm that is already 
> stored in the database to allow for custom values – e.g., 
> "ke.or.ampath.matching.FooBarMatchSimilarityAlgorithm".  We will supply our 
> own algorithms on the classpath.  As long as the patient matching module 
> calls algorithm.getMatchSimilarity(v1,v2) when comparing values, we're good.
> 
> Eventually, we'd like to something similar with column definitions, e.g. 
> something like:
> 
> interface CustomColumn {
>   public String getName();
>   public String getValue(Patient patient);
> }
> 
> But we can focus on the algorithm first, since it will solve our immediate 
> problems.  As long as we can get agreement on a patch, then AMPATH can apply 
> the patch locally to move forward with custom algorithms while the patient 
> matching module is being updated.
> 
> Cheers,
> 
> -Burke
> 
> 
> On Mon, Feb 27, 2012 at 1:03 PM, Suranga Kasthurirathne 
> <[email protected]> wrote:
> 
> Hi,
> 
> This sounds quite practical to me. I understand what you're tying to do. (of 
> course, i'm not THE top expert; we'll need to consult with Dr, Shaun too.)
> In regard to improving the MatchResult class, I think that we should follow 
> the second plan that Dr. Burke suggested.
> However the question is, do we need database support  for this functionality 
> ? I don't think so.
> 
> In response to Jeremy's questions, I feel that these changes can be fitted 
> into the current design quite easily.
> And also ( Pending Dr. Shauns comments) I feel that these changes can go into 
> the trunk itself.
> Looking at the possibility of custom columns; once again, this is possible, 
> but not quite easy. It will need some re-factoring. And also, where are we 
> going to store information on the custom columns? Be it either a flat file or 
> a database, we're still going to need some major changes.
> And finally, will creating custom columns affect our calculations in any way 
> ? there are a lot of calculations happening under the hood, so we'll need to 
> check that out as well.
> 
> I'd like to help you out with this in any way that I can.
> 
> 
> 
> 
> On Mon, Feb 27, 2012 at 9:11 PM, Suranga Kasthurirathne 
> <[email protected]> wrote:
> Hi,
> 
> Im on a bus right now, so I could not go through the mails properly yet. I 
> will try to come up with some sensible comments as soon as possible :-)
> 
> On 27 Feb 2012 20:22, "Burke Mamlin" <[email protected]> wrote:
> More specifically, we'd like to refactor 
> MatchResult.getSimilarityScore(String demographic) so that it uses an 
> interface instead of a switch statement to match values.
> 
> For example:
> 
> interface MatchSimilarityAlgorithm {
>   public String getName();
>   public double getMatchSimilarity(String val1, String val2);
> }
> 
> then creating classes like:
> 
> class ExactMatchSimilarity implements MatchSimilarityAlgorithm {
>   public String getName() {
>     return "Exact Match";
>   }
>   public double getMatchSimilarity(String val1, String val2) {
>     return StringMatch.getExactMatchSimilarity(val1, val2);
>   }
> }
> 
> class LCSMatchSimiliarity implements MatchSimilarityAlgorithm {
>   public String getName() {
>     return "Longest Common Subsequence";
>   }
>   public double getMatchSimilarity(String val1, String val2) {
>     return StringMatch.getLCSMatchSimilarity(val1, val2);
>   }
> }
> 
> etc...
> 
> This would make algorithm a MatchSiimilarityAlgorithm instead of a String.  
> The switch statement in MatchResult.getSimilarityScore(String) can then be 
> replaced with:
> 
> return algorithm.getSimilarityScore(val1, val2);
> 
> This would allow us to introduce new algorithms.  Of course, the UI would 
> need some minor refactoring to use the MatchSimilarityAlgorithm.getName() 
> method to get names of algoirthms.
> 
> If those changes seem to bold at first, then a quicker step in the direction 
> would be to introduce the interface, allow algorithm to be a class name, and 
> refactor the last (default) case in the switch statement to something like:
> 
> default:
>   try {
>     Class clazz = Class.forName(algorithm);
>     if (clazz instanceof MatchSimilarityAlgorithm)
>       return ((MatchSimilarityAlgorithm)algorithm)
>                .getSimilarityScore(val1, val2);
>   } catch (ClassNotFoundException e) {
>   return 0;
> 
> This wouldn't fully customize algorithms, but would at least allow us to 
> sneak in our own algorithms during matching.
> 
> Cheers,
> 
> -Burke
> 
> On Mon, Feb 27, 2012 at 8:49 AM, Jeremy Keiper <[email protected]> wrote:
> Suranga, James, Burke, Shaun, etc -
> 
> AMPATH has a development-backed immediate interest in these custom column 
> definitions for acquiring and comparing attributes of patient data.  If there 
> is any way that this can be fitted into the current design, could we get some 
> direction on how it could work and whether the Patient Matching Module 
> authors would like to incorporate it to trunk or allow us to branch it for 
> those purposes?
> 
> Thanks!
> 
> Jeremy Keiper
> OpenMRS Core Developer
> AMPATH / IU-Kenya Support
> 
> 
> 
> On Tue, Feb 21, 2012 at 10:00 PM, Rajib Sengupta <[email protected]> wrote:
> Hello All,
> As new implementor of openMRS, I cannot resist, but to chime in this 
> discussion.
>  
> If the web UI of patient matching can be configurable,  it will be indeed a 
> great feature. Primarily this will help implementation team which is low with 
> pure developer resources and for implementations in new geographic regions 
> where the patient matching can be based on different patient attributes, that 
> is not available in the current ui, out of the box.
>  
> Again, this is just our opinion and want to share it with the development 
> team so that development items can be prioritized from an user usability 
> perspective.
>  
> Thanks,
> Rajib
> 
> From: Suranga Kasthurirathne <[email protected]>
> To: [email protected] 
> Sent: Tuesday, February 21, 2012 12:21 PM
> Subject: Re: [OPENMRS-DEV] Latest improvements to the Patient Matching module
> 
> 
> Hi,
> 
> I agree. Right now, the web based ui that you'll get by loading the module 
> doesn't actually let you make use of the extensive (really really really 
> extensive) functionality the patientmatching module can offer. If fact, 
> sometimes I feel that the web ui is tying to hide its true awesomeness.
> If the web ui let you make use of all the extensive functionality that the 
> module offers, it would be one of the top favorite of our modules.
> However that said, i'm not sure which of the two above mentioned efforts 
> should get priority over the other. I'm OK with whatever the experts agree on 
> :-)
> 
> 
> On Tue, Feb 21, 2012 at 11:24 AM, Darius Jazayeri <[email protected]> 
> wrote:
> It certainly has potential as a GSoC project.
> 
> That said, my main concern with the Patient Matching module has been that the 
> UI is not easily approachable by anyone who hasn't built up some domain 
> knowledge on patient matching. I believe this has been improved, though I'm 
> not sure how much. Making it usable by an admin with only basic OpenMRS 
> knowledge would be a higher priority for me. (Unless that's already been 
> done!)
> 
> -Darius
> 
> 
> On Mon, Feb 20, 2012 at 8:50 PM, Suranga Kasthurirathne 
> <[email protected]> wrote:
> 
> Hi,
> 
> To the best of my knowledge, the patient Matching module provides multiple 
> ways of reading in data (txt files, one or two databases)
> but doesn't really provide you with an interface for data columns.
> So right now, it seems to be that we really can't add derived columns like 
> the ones you and Darius mentioned.
> 
> However, it does provide an interface for Analyzers (which is comparable to 
> algorithms), so users can write up their own analyzers if they wish to do so.
> I understand the functionality that you are talking about, and I realize that 
> its an important and useful feature.
> Possibly, this will be easier to implement once the database changes are in.
> 
> I will talk to James regarding this, and get his feedback on my comments....
> 
> @Dr. Burke, @Darius, mmm.... does this sound like a good GSOC idea to you ? 
> maybe Dr. Grannis will also be interested :-)
> 
> 
> Thanks and best regards,
> Suranga
> 
> 
> 
> On Tue, Feb 21, 2012 at 5:44 AM, Darius Jazayeri <[email protected]> 
> wrote:
> What Burke describes, with the ability to write those as groovy and store 
> them in the DB would be pretty neat. :-)
> 
> -Darius
> 
> 
> On Mon, Feb 20, 2012 at 2:08 PM, Burke Mamlin <[email protected]> wrote:
> Suranga,
> 
> Does the patient matching module provide interfaces for a data source columns 
> and/or algorithms so implementations could define custom column(s) and/or 
> custom comparison algorithm(s)?  For example, I'm imagining something like:
> 
> interface DataSourceColumn {
>   public String name();
>   public String getValue(Patient patient);
> }
> 
> interface Algorithm {
>   public String getName();
>   public float match(String a, String b);
> }
> 
> Assuming the out-of-the-box patient demographics implement the 
> DataSourceColumn interface, it would be easy to add new derived columns 
> (e.g., a region-specific soundex algorithm, a column that concatenates all 
> patient identifiers, or a column that combines latitude & longitude from 
> address).  And assuming that "Exact Match" and other out-of-the-box 
> comparison algorithms imlemented the Algorithm interface, then it would be 
> easy to add new custom algorithms (e.g., matching GPS coordinates based on 
> computed distance).
> 
> Cheers,
> 
> -Burke
> 
> On Mon, Feb 20, 2012 at 3:15 AM, Suranga Kasthurirathne 
> <[email protected]> wrote:
> 
> Hi everyone,
> 
> We have just committed some important changes to the patientMatching module. 
> 
> These changes are part of our efforts to move user generated patient matching 
> report details from a flat file system to a database. We have not completed 
> the entire effort, but have made a reasonable amount of progress. At this 
> given moment, the enhancements we made will let you store user defined 
> patient matching strategies in the database.  Efforts to store the entire 
> user generated report in the database are still underway.
> Please note that following these changes, user defined strategies are now 
> stored in database tables, and not in the existing config.xml file.
> 
> However, do note that this effort is still at an experimental level, so do be 
> careful if you decide to use the latest build.
> 
> --
> On behalf of the PatientMatching team,
> Suranga
> 
> Click here to unsubscribe from OpenMRS Developers' mailing list
> 
> Click here to unsubscribe from OpenMRS Developers' mailing list
> 
> Click here to unsubscribe from OpenMRS Developers' mailing list
> 
> 
> 
> -- 
> Best Regards,
> 
> Suranga
> 
> Click here to unsubscribe from OpenMRS Developers' mailing list
> 
> Click here to unsubscribe from OpenMRS Developers' mailing list
> 
> 
> 
> -- 
> Best Regards,
> 
> Suranga
> 
> Click here to unsubscribe from OpenMRS Developers' mailing list
> 
> 
> Click here to unsubscribe from OpenMRS Developers' mailing list
> 
> Click here to unsubscribe from OpenMRS Developers' mailing list
> 
> Click here to unsubscribe from OpenMRS Developers' mailing list
> 
> 
> 
> -- 
> Best Regards,
> 
> Suranga
> 
> Click here to unsubscribe from OpenMRS Developers' mailing list
> 
> Click here to unsubscribe from OpenMRS Developers' mailing list
> 
> Click here to unsubscribe from OpenMRS Developers' mailing list
> 
> Click here to unsubscribe from OpenMRS Developers' mailing list
> 
> Click here to unsubscribe from OpenMRS Developers' mailing list


_________________________________________

To unsubscribe from OpenMRS Developers' mailing list, send an e-mail to 
[email protected] with "SIGNOFF openmrs-devel-l" in the  body (not 
the subject) of your e-mail.

[mailto:[email protected]?body=SIGNOFF%20openmrs-devel-l]

Re: [OPENMRS-DEV] Latest improvements to the Patient Matching module

Reply via email to