Hi,
at the moment I am reverse engineering EMF Compare and I've already read
much material. I think I found some inconsistencies among the material
and want to task if I understand things right.
That are the statements in question:
a) According to [1] EMF Compare uses Levenshtein distance for string
similarity.
b) According to [3] EMF Compare 1.3 is similar to [4]. In [4] the Dice
coefficient (although it is not named explicitly) is used for string
similarity.
After a code review of [2] and [5], I came to the following conclusions:
I) EMF Compare 1.x and 2.x use the Dice coefficient with bi-grams for
string similarity
II) EMF Compare 2.x uses the Longest Common Subsequence to determine
changes in multi-references of EObjects
III) a) is wrong/outdated.
I appreciate if someone can approve my conclusions.
References:
[1]
http://eclipsesummit.org/summiteurope2006/presentations/ESE2006-EclipseModelingSymposium10_EMFCompareUtility.pdf
[2]
http://git.eclipse.org/c/emfcompare/org.eclipse.emf.compare.git/tree/plugins/org.eclipse.emf.compare.match/src/org/eclipse/emf/compare/match/internal/statistic/NameSimilarity.java?h=1.3
[3]
http://wiki.eclipse.org/EMF_Compare/FAQ/1.3#What_kind_of_.22strategies.22_use_EMF_compare_.3F
[4] http://ase.cs.uni-due.de/olbib/p54-xing-241.pdf
[5]
http://git.eclipse.org/c/emfcompare/org.eclipse.emf.compare.git/tree/plugins/org.eclipse.emf.compare/src/org/eclipse/emf/compare/utils/DiffUtil.java?h=2.1
_______________________________________________
emf-dev mailing list
[email protected]
https://dev.eclipse.org/mailman/listinfo/emf-dev