have you taken a look at org.apache.ctakes.utils.xcas_comparison.Compare Without looking at the source, I've forgotten most of the little I once knew about it. But we had suggested it in cTAKES 1.0 for helping people compare some parts at least Maybe you will find some part of it helpful?
-- James ________________________________________ From: [email protected] [[email protected]] on behalf of ASF subversion and git services (JIRA) [[email protected]] Sent: Wednesday, July 17, 2013 8:35 PM To: [email protected] Subject: [jira] [Commented] (CTAKES-217) create a tool for "diff"-ing two CASes [ https://issues.apache.org/jira/browse/CTAKES-217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13711911#comment-13711911 ] ASF subversion and git services commented on CTAKES-217: -------------------------------------------------------- Commit 1504339 from [~steven.bethard] in branch 'ctakes/trunk' [ https://svn.apache.org/r1504339 ] CTAKES-217: Revises CompareFeatureStructures to use java-diff-utils. The search for FeatureStructure equality is the same, but now nested uses of DiffUtils produce what is hopefully better output. In particular, there should now be more useful output for the case where annotations have been inserted or deleted (not just changed). > create a tool for "diff"-ing two CASes > -------------------------------------- > > Key: CTAKES-217 > URL: https://issues.apache.org/jira/browse/CTAKES-217 > Project: cTAKES > Issue Type: New Feature > Reporter: Steven Bethard > > It would be handy to be able to easily get a "diff" of two CASes. Some > possibilities: > (1) Just diff the XMIs. This doesn't work very well because the IDs are > typically different in different XMIs generated from the same annotations. > (2) Output all annotations, using their .toString(), and diff that file using > a standard diff algorithm. This might mostly work if we could guarantee a > consistent ordering of the annotations in the CAS. (That's easy to do for > Annotations, but not always possible for TOPs.) But some things aren't > displayed in the .toString(), e.g. the values inside FSArrays and FSLists. > In r1504269, I added CompareFeatureStructures which isn't either of these, > but is a bit closer to (2). It sorts annotations by offset (and for TOPs, > looks through their features to find offsets), and then compares each pair of > FeatureStructures by walking the tree of their features. I'm mostly happy > with how it handles the comparison of two FeatureStructures (though > .toString() is a bit hacky). > The main issue is that it doesn't really do anything useful if you have > different numbers of annotations in the two CASes. It just prints a message > saying that the numbers are different. Instead, it should be able to identify > insertions and deletions of annotations. Probably there's a way to do this > with java-diff-utils, though I wasn't able to figure one out on my first > attempt. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
