I don't see the error:
levdist NB. Taken from Rosetta Code
4 : 0
'a b'=. (x;y) /: (#x),(#y)
D=. >: iz =. i.#b
for_j. a do.
D=. <./\&.(-&iz) (>: D) <. (j ~: b) + |.!.j_index D
end.
{:D
)
'baa' levdist 'asap'
3
Henry Rich
On 11/15/2022 1:57 PM, 'Michael Day' via General wrote:
Not sure if this is General or Programming - sorry if wrong.
The J entry for this Rosetta Code "task", at
https://rosettacode.org/wiki/Levenshtein_distance#J
offers 2 J functions;
the first, "levenshtein", is a "literal transcription of the
Wikipedia implementation", to be found at
https://en.wikipedia.org/wiki/Levenshtein_distance
The second more J-oriented "levdist" is quicker and uses less memory.
However, to my surprise, it has some error. Consider this simple
example:
'baa' (levenshtein ; levdist) 'asap'
┌─┬─┐
│3│2│
└─┴─┘
I haven't found where levdist fails, though I suspect it's something
to do with insertion.
For what it's worth, I've worked up a J-version of the second
function suggested in
the Wikipedia article. The relevant paragraph is headed "Iterative
with two matrix rows."
It's apparently based on this reference:
https://www.codeproject.com/Articles/13525/Fast-memory-efficient-Levenshtein-algorithm-2
Here's my function, which I call levw, the w for Wiki:
NB. Single loop J implementation of above function
NB. Doesn't need explicit v1 array
levw =: 4 : 0
'm n'=. #each 's t'=. (x;y) /: (#x),(#y)
v0 =. i. n+1
for_i. i.m do.
scost =. (}:v0) + (i{s) ~: t{~ jl =. i.n NB. substitution cost
minsdcost=. scost <. >: v0{~ jl+1 NB. minimum of
substition cost and deletion cost
icost =. >:i NB. initial insertion
cost for row i (I think!)
v0 =. (<.>:) /\.&.: |. icost, minsdcost NB. minima of
insertion, deletion & substitution cost
end.
{:v0
)
(The Wiki function has a double loop, on i.m and i.n - the inner loop
is replaced by the (<.>:) /\.&.: |. stuff )
It performs correctly in the previous test; (but I haven't tested
edge cases)
'baa' (levenshtein ; levw) 'asap'
┌─┬─┐
│3│3│
└─┴─┘
The RosettaCode examples are trivial in time & space, but, FWIW,
'kitten' (levenshtein;levdist;levw)'sitting'
┌─┬─┬─┐
│3│3│3│
└─┴─┴─┘
timespacex '''kitten'' levenshtein ''sitting'''
0.0001067 4992
timespacex '''kitten'' levdist ''sitting''' NB. levdist is
correct in this case
1.31e_5 3072
timespacex '''kitten'' levw ''sitting'''
2.17e_5 3200
I would have preferred to communicate directly with the author of the
J entry, but don't
know how to find their name/s. So I do apologise to them for this
public message.
Cheers,
Mike
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm