Package: python3-textdistance
Version: 4.2.2-4
Severity: normal
Tags: upstream
The autopkgtest failed on this package when testing against a new
upload of pyxdameraulevenshtein:
________________________ test_qval[DamerauLevenshtein] _________________________
alg = 'DamerauLevenshtein'
@pytest.mark.external
> @pytest.mark.parametrize('alg', libraries.get_algorithms())
tests/test_external.py:41:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
left = '01', right = '120', alg = 'DamerauLevenshtein'
@pytest.mark.external
@pytest.mark.parametrize('alg', libraries.get_algorithms())
@hypothesis.given(
left=hypothesis.strategies.text(min_size=1),
right=hypothesis.strategies.text(min_size=1),
)
def test_qval(left, right, alg):
for lib in libraries.get_libs(alg):
conditions = lib.conditions or {}
internal_func = getattr(textdistance, alg)(external=False,
**conditions)
external_func = lib.get_function()
# algorithm doesn't support q-grams
if not hasattr(internal_func, 'qval'):
continue
for qval in (None, 1, 2, 3):
internal_func.qval = qval
# if qval unsopporting already set for lib
s1, s2 = internal_func._get_sequences(left, right)
if not lib.check_conditions(internal_func, s1, s2):
continue
# test
int_result = internal_func(left, right)
s1, s2 = lib.prepare(s1, s2)
ext_result = external_func(s1, s2)
> assert isclose(int_result, ext_result), str(lib)
E AssertionError: jellyfish.damerau_levenshtein_distance
E assert False
E + where False = isclose(3, 2)
tests/test_external.py:65: AssertionError
---------------------------------- Hypothesis ----------------------------------
Falsifying example: test_qval(
left='01', right='120', alg='DamerauLevenshtein',
)
It turns out that there are actually two different Damerau-Levenshtein
distances: the restricted version and the unrestricted version. The
Wikipedia page describes exactly the example found by hypothesis (just
with different charaters 0=b, 1=a, 2=c):
https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance
(This example is taken from the Boytsov paper referenced there.)
textdistance offers three libraries which calculate "the"
Damerau-Levenshtein distance. Here is the situation:
abydos: unrestricted
jellyfish: unrestricted
pyxdameraulevenshtein: restricted
textdistance internal: restricted
This causes the test to fail when the inputs are '01' and '120'.
The simplest solution is to do the following: offer two versions of
the Damerau-Levenshtein distance in textdistance (and I can contact
the authors of the other three packages to ask them to clarify in
their documentation that they are calculating the
restricted/unrestricted distance). Then textdistance/libraries.py
would instead say:
prototype.register('DamerauLevenshteinUnrestricted',
LibraryBase('abydos.distance', 'DamerauLevenshtein'))
prototype.register('DamerauLevenshteinUnrestricted', TextLibrary('jellyfish',
'damerau_levenshtein_distance'))
prototype.register('DamerauLevenshteinRestricted',
LibraryBase('pyxdameraulevenshtein', 'damerau_levenshtein_distance'))
prototype.register('DamerauLevenshtein', LibraryBase('abydos.distance',
'DamerauLevenshtein'))
prototype.register('DamerauLevenshtein', TextLibrary('jellyfish',
'damerau_levenshtein_distance'))
with 'DamerauLevenshtein' becoming the unrestricted one. Obviously,
textdistance/algorithms/edit_based.py would then need to have a new
DamerauLevenshteinUnrestricted class as well (with the existing class
renamed as DamerauLevenshteinRestricted, and then setting
DamerauLevenshtein = DamerauLevenshteinUnrestricted).
MongeElkan uses DamerauLevenshtein, but as that does not appear in any
external library, there is no problem there.