Great discussion. I agree wholeheartedly with the notions of CustomColumn (or maybe call it "DerivedColumn" or "DerivedTrait"?), MatchSimilarityAlgorithm (although I might call this ColumnSimilarityAlgorithm, FieldSimilarityAlgorithm or TraitSimilarityAlgorithm -- 'traits' being those data elements that describe an individual, like name, date of birth, gender, geographic location, etc.)
One thought I'd contribute regarding phonetic transformation: the design decision point may not be ease of development (e.g., javascript versus java versus groovy -- I have no particular preference), but raw performance. Depending on the phonetic transformation algorithm -- and as you all likely know there are a few: Soundex, NYSIIS, Metaphone, double-Metaphone, each with their advantages and disadvantages among various ethnic groups -- the computational cost to create that transformation may be appreciable, potentially creating unacceptable execution times for the the matching process. We've intermittently debated the merits of on-the-fly transformation versus creating these derived fields when committing the original values. So I would ask the community: Should "derived traits" or "custom columns" -- such as phonetic transformations -- be created at commit time? Or created on-the-fly at match time? Would one possible approach to answering this question be to conduct performance testing for each particular transformation or derivation using real-world data? Another important question–research anyone? ;) –that may or may not have been answered, particularly for African names: Which phonetic transformations work best (e.g., produce the maximum matching accuracy) for a particular ethnic group? Best, - Shaun On Feb 28, 2012, at 6:08 PM, Dave Thomas wrote: > Thanks Burke. > > Actually, the phonetics encoders would be great candidates for scripting, > because ultimately, they're just a set of regex-driven string substitutions... > > d > > On Mon, Feb 27, 2012 at 6:35 PM, Burke Mamlin <[email protected]> wrote: > Dave, > > Our immediate goals at AMPATH are to handle multiple identifiers and possibly > do distance comparisons of GPS coordinates. Your's is another good example > of another handy matching algorithm. The goal of creating an interface for > the algorithm is to allow for developers to create any algorithm that meets > their needs. So, instead of trying to solve every problem in the patient > matching module, we'll make it easy for devs add customizations. I've spoken > with Shaun Grannis and he likes the idea. > > While we'll start by introducing a custom algorithms, I'd also like to do the > same on the front end – i.e., allow for a custom column (per Shaun, this > should be called a "custom trait"), so devs could easily throw classes on the > classpath to expand on the list of available traits (columns) as well as the > list of possible methods of matching data. This would open the door for > popular customizations to eventually be packaged with the module. And, I'd > love to see a set of customizations use the scripting interface built into > Java (JSR-223) to create custom traits & algorithms that could bind the > patient and API to a scripting context and execute user-written scripts for > truly customizable matching (of course it would need to be efficient, but > javax.script defines a CompiledScript… so maybe it's not just a dream). > Imagine being able to write your namephonetics algorithm in Groovy or > JavaScript and use it within the patient matching module. ;-) > > Small steps... > > -Burke > > > On Mon, Feb 27, 2012 at 5:16 PM, Dave Thomas <[email protected]> wrote: > Hi. I haven't followed this discussion very closely but if you're creating > interfaces in the patient matching module, i just wanted to register the > use-case that it would be really cool if there were interfaces exposed in > patient matching that would allow namephonetics to register a phonetics > equivalence test as part of the matching criteria. > > (meaning finding names that are phonetically the same, even if spelled > differently, according to the phonetics structure of the local language) > > d > > > On Mon, Feb 27, 2012 at 10:33 AM, Burke Mamlin <[email protected]> > wrote: > All we need (at least to get started) is for the algorithm that is already > stored in the database to allow for custom values – e.g., > "ke.or.ampath.matching.FooBarMatchSimilarityAlgorithm". We will supply our > own algorithms on the classpath. As long as the patient matching module > calls algorithm.getMatchSimilarity(v1,v2) when comparing values, we're good. > > Eventually, we'd like to something similar with column definitions, e.g. > something like: > > interface CustomColumn { > public String getName(); > public String getValue(Patient patient); > } > > But we can focus on the algorithm first, since it will solve our immediate > problems. As long as we can get agreement on a patch, then AMPATH can apply > the patch locally to move forward with custom algorithms while the patient > matching module is being updated. > > Cheers, > > -Burke > > > On Mon, Feb 27, 2012 at 1:03 PM, Suranga Kasthurirathne > <[email protected]> wrote: > > Hi, > > This sounds quite practical to me. I understand what you're tying to do. (of > course, i'm not THE top expert; we'll need to consult with Dr, Shaun too.) > In regard to improving the MatchResult class, I think that we should follow > the second plan that Dr. Burke suggested. > However the question is, do we need database support for this functionality > ? I don't think so. > > In response to Jeremy's questions, I feel that these changes can be fitted > into the current design quite easily. > And also ( Pending Dr. Shauns comments) I feel that these changes can go into > the trunk itself. > Looking at the possibility of custom columns; once again, this is possible, > but not quite easy. It will need some re-factoring. And also, where are we > going to store information on the custom columns? Be it either a flat file or > a database, we're still going to need some major changes. > And finally, will creating custom columns affect our calculations in any way > ? there are a lot of calculations happening under the hood, so we'll need to > check that out as well. > > I'd like to help you out with this in any way that I can. > > > > > On Mon, Feb 27, 2012 at 9:11 PM, Suranga Kasthurirathne > <[email protected]> wrote: > Hi, > > Im on a bus right now, so I could not go through the mails properly yet. I > will try to come up with some sensible comments as soon as possible :-) > > On 27 Feb 2012 20:22, "Burke Mamlin" <[email protected]> wrote: > More specifically, we'd like to refactor > MatchResult.getSimilarityScore(String demographic) so that it uses an > interface instead of a switch statement to match values. > > For example: > > interface MatchSimilarityAlgorithm { > public String getName(); > public double getMatchSimilarity(String val1, String val2); > } > > then creating classes like: > > class ExactMatchSimilarity implements MatchSimilarityAlgorithm { > public String getName() { > return "Exact Match"; > } > public double getMatchSimilarity(String val1, String val2) { > return StringMatch.getExactMatchSimilarity(val1, val2); > } > } > > class LCSMatchSimiliarity implements MatchSimilarityAlgorithm { > public String getName() { > return "Longest Common Subsequence"; > } > public double getMatchSimilarity(String val1, String val2) { > return StringMatch.getLCSMatchSimilarity(val1, val2); > } > } > > etc... > > This would make algorithm a MatchSiimilarityAlgorithm instead of a String. > The switch statement in MatchResult.getSimilarityScore(String) can then be > replaced with: > > return algorithm.getSimilarityScore(val1, val2); > > This would allow us to introduce new algorithms. Of course, the UI would > need some minor refactoring to use the MatchSimilarityAlgorithm.getName() > method to get names of algoirthms. > > If those changes seem to bold at first, then a quicker step in the direction > would be to introduce the interface, allow algorithm to be a class name, and > refactor the last (default) case in the switch statement to something like: > > default: > try { > Class clazz = Class.forName(algorithm); > if (clazz instanceof MatchSimilarityAlgorithm) > return ((MatchSimilarityAlgorithm)algorithm) > .getSimilarityScore(val1, val2); > } catch (ClassNotFoundException e) { > return 0; > > This wouldn't fully customize algorithms, but would at least allow us to > sneak in our own algorithms during matching. > > Cheers, > > -Burke > > On Mon, Feb 27, 2012 at 8:49 AM, Jeremy Keiper <[email protected]> wrote: > Suranga, James, Burke, Shaun, etc - > > AMPATH has a development-backed immediate interest in these custom column > definitions for acquiring and comparing attributes of patient data. If there > is any way that this can be fitted into the current design, could we get some > direction on how it could work and whether the Patient Matching Module > authors would like to incorporate it to trunk or allow us to branch it for > those purposes? > > Thanks! > > Jeremy Keiper > OpenMRS Core Developer > AMPATH / IU-Kenya Support > > > > On Tue, Feb 21, 2012 at 10:00 PM, Rajib Sengupta <[email protected]> wrote: > Hello All, > As new implementor of openMRS, I cannot resist, but to chime in this > discussion. > > If the web UI of patient matching can be configurable, it will be indeed a > great feature. Primarily this will help implementation team which is low with > pure developer resources and for implementations in new geographic regions > where the patient matching can be based on different patient attributes, that > is not available in the current ui, out of the box. > > Again, this is just our opinion and want to share it with the development > team so that development items can be prioritized from an user usability > perspective. > > Thanks, > Rajib > > From: Suranga Kasthurirathne <[email protected]> > To: [email protected] > Sent: Tuesday, February 21, 2012 12:21 PM > Subject: Re: [OPENMRS-DEV] Latest improvements to the Patient Matching module > > > Hi, > > I agree. Right now, the web based ui that you'll get by loading the module > doesn't actually let you make use of the extensive (really really really > extensive) functionality the patientmatching module can offer. If fact, > sometimes I feel that the web ui is tying to hide its true awesomeness. > If the web ui let you make use of all the extensive functionality that the > module offers, it would be one of the top favorite of our modules. > However that said, i'm not sure which of the two above mentioned efforts > should get priority over the other. I'm OK with whatever the experts agree on > :-) > > > On Tue, Feb 21, 2012 at 11:24 AM, Darius Jazayeri <[email protected]> > wrote: > It certainly has potential as a GSoC project. > > That said, my main concern with the Patient Matching module has been that the > UI is not easily approachable by anyone who hasn't built up some domain > knowledge on patient matching. I believe this has been improved, though I'm > not sure how much. Making it usable by an admin with only basic OpenMRS > knowledge would be a higher priority for me. (Unless that's already been > done!) > > -Darius > > > On Mon, Feb 20, 2012 at 8:50 PM, Suranga Kasthurirathne > <[email protected]> wrote: > > Hi, > > To the best of my knowledge, the patient Matching module provides multiple > ways of reading in data (txt files, one or two databases) > but doesn't really provide you with an interface for data columns. > So right now, it seems to be that we really can't add derived columns like > the ones you and Darius mentioned. > > However, it does provide an interface for Analyzers (which is comparable to > algorithms), so users can write up their own analyzers if they wish to do so. > I understand the functionality that you are talking about, and I realize that > its an important and useful feature. > Possibly, this will be easier to implement once the database changes are in. > > I will talk to James regarding this, and get his feedback on my comments.... > > @Dr. Burke, @Darius, mmm.... does this sound like a good GSOC idea to you ? > maybe Dr. Grannis will also be interested :-) > > > Thanks and best regards, > Suranga > > > > On Tue, Feb 21, 2012 at 5:44 AM, Darius Jazayeri <[email protected]> > wrote: > What Burke describes, with the ability to write those as groovy and store > them in the DB would be pretty neat. :-) > > -Darius > > > On Mon, Feb 20, 2012 at 2:08 PM, Burke Mamlin <[email protected]> wrote: > Suranga, > > Does the patient matching module provide interfaces for a data source columns > and/or algorithms so implementations could define custom column(s) and/or > custom comparison algorithm(s)? For example, I'm imagining something like: > > interface DataSourceColumn { > public String name(); > public String getValue(Patient patient); > } > > interface Algorithm { > public String getName(); > public float match(String a, String b); > } > > Assuming the out-of-the-box patient demographics implement the > DataSourceColumn interface, it would be easy to add new derived columns > (e.g., a region-specific soundex algorithm, a column that concatenates all > patient identifiers, or a column that combines latitude & longitude from > address). And assuming that "Exact Match" and other out-of-the-box > comparison algorithms imlemented the Algorithm interface, then it would be > easy to add new custom algorithms (e.g., matching GPS coordinates based on > computed distance). > > Cheers, > > -Burke > > On Mon, Feb 20, 2012 at 3:15 AM, Suranga Kasthurirathne > <[email protected]> wrote: > > Hi everyone, > > We have just committed some important changes to the patientMatching module. > > These changes are part of our efforts to move user generated patient matching > report details from a flat file system to a database. We have not completed > the entire effort, but have made a reasonable amount of progress. At this > given moment, the enhancements we made will let you store user defined > patient matching strategies in the database. Efforts to store the entire > user generated report in the database are still underway. > Please note that following these changes, user defined strategies are now > stored in database tables, and not in the existing config.xml file. > > However, do note that this effort is still at an experimental level, so do be > careful if you decide to use the latest build. > > -- > On behalf of the PatientMatching team, > Suranga > > Click here to unsubscribe from OpenMRS Developers' mailing list > > Click here to unsubscribe from OpenMRS Developers' mailing list > > Click here to unsubscribe from OpenMRS Developers' mailing list > > > > -- > Best Regards, > > Suranga > > Click here to unsubscribe from OpenMRS Developers' mailing list > > Click here to unsubscribe from OpenMRS Developers' mailing list > > > > -- > Best Regards, > > Suranga > > Click here to unsubscribe from OpenMRS Developers' mailing list > > > Click here to unsubscribe from OpenMRS Developers' mailing list > > Click here to unsubscribe from OpenMRS Developers' mailing list > > Click here to unsubscribe from OpenMRS Developers' mailing list > > > > -- > Best Regards, > > Suranga > > Click here to unsubscribe from OpenMRS Developers' mailing list > > Click here to unsubscribe from OpenMRS Developers' mailing list > > Click here to unsubscribe from OpenMRS Developers' mailing list > > Click here to unsubscribe from OpenMRS Developers' mailing list > > Click here to unsubscribe from OpenMRS Developers' mailing list _________________________________________ To unsubscribe from OpenMRS Developers' mailing list, send an e-mail to [email protected] with "SIGNOFF openmrs-devel-l" in the body (not the subject) of your e-mail. [mailto:[email protected]?body=SIGNOFF%20openmrs-devel-l]

