[emf-dev] EMF Compare Name Similarity

Simon Fri, 05 Jul 2013 05:54:56 -0700

Hi,

at the moment I am reverse engineering EMF Compare and I've already readmuch material. I think I found some inconsistencies among the materialand want to task if I understand things right.


That are the statements in question:

a) According to [1] EMF Compare uses Levenshtein distance for stringsimilarity.b) According to [3] EMF Compare 1.3 is similar to [4]. In [4] the Dicecoefficient (although it is not named explicitly) is used for stringsimilarity.



After a code review of [2] and [5], I came to the following conclusions:

I) EMF Compare 1.x and 2.x use the Dice coefficient with bi-grams forstring similarityII) EMF Compare 2.x uses the Longest Common Subsequence to determinechanges in multi-references of EObjects

III) a) is wrong/outdated.

I appreciate if someone can approve my conclusions.




References:

[1]http://eclipsesummit.org/summiteurope2006/presentations/ESE2006-EclipseModelingSymposium10_EMFCompareUtility.pdf

[2]http://git.eclipse.org/c/emfcompare/org.eclipse.emf.compare.git/tree/plugins/org.eclipse.emf.compare.match/src/org/eclipse/emf/compare/match/internal/statistic/NameSimilarity.java?h=1.3

[3]http://wiki.eclipse.org/EMF_Compare/FAQ/1.3#What_kind_of_.22strategies.22_use_EMF_compare_.3F


[4] http://ase.cs.uni-due.de/olbib/p54-xing-241.pdf

[5]http://git.eclipse.org/c/emfcompare/org.eclipse.emf.compare.git/tree/plugins/org.eclipse.emf.compare/src/org/eclipse/emf/compare/utils/DiffUtil.java?h=2.1

_______________________________________________
emf-dev mailing list
[email protected]
https://dev.eclipse.org/mailman/listinfo/emf-dev

[emf-dev] EMF Compare Name Similarity

Reply via email to