I don't see the error:

   levdist   NB. Taken from Rosetta Code
4 : 0
  'a b'=. (x;y) /: (#x),(#y)
  D=. >: iz =. i.#b
  for_j. a do.
    D=. <./\&.(-&iz) (>: D) <. (j ~: b) + |.!.j_index D
  end.
  {:D
)

   'baa' levdist 'asap'
3

Henry Rich

On 11/15/2022 1:57 PM, 'Michael Day' via General wrote:
Not sure if this is General or Programming - sorry if wrong.

The J entry for this Rosetta Code "task",  at
https://rosettacode.org/wiki/Levenshtein_distance#J
offers 2 J functions;
the first,  "levenshtein",  is a "literal transcription of the Wikipedia implementation", to be found at
https://en.wikipedia.org/wiki/Levenshtein_distance

The second more J-oriented "levdist"  is quicker and uses less memory.
However,  to my surprise,  it has some error.  Consider this simple example:

   'baa' (levenshtein ; levdist) 'asap'
┌─┬─┐
│3│2│
└─┴─┘

I haven't found where levdist fails,  though I suspect it's something to do with insertion.

For what it's worth,  I've worked up a J-version of the second function suggested in the Wikipedia article.  The relevant paragraph is headed "Iterative with two matrix rows."
It's apparently based on this reference:
https://www.codeproject.com/Articles/13525/Fast-memory-efficient-Levenshtein-algorithm-2

Here's my function,  which I call levw,  the w for Wiki:

NB. Single loop J implementation of above function
NB. Doesn't need explicit v1 array
levw =: 4 : 0
'm n'=. #each 's t'=. (x;y) /: (#x),(#y)
v0   =. i. n+1
for_i. i.m do.
   scost    =. (}:v0) + (i{s) ~: t{~ jl =. i.n   NB. substitution cost
   minsdcost=. scost <. >: v0{~ jl+1             NB. minimum of substition cost and deletion cost    icost    =. >:i                               NB. initial insertion cost for row i (I think!)    v0       =. (<.>:) /\.&.: |. icost, minsdcost NB. minima of insertion, deletion & substitution cost
end.
{:v0
)

(The Wiki function has a double loop,  on i.m and i.n - the inner loop is replaced by the (<.>:) /\.&.: |.  stuff )

It performs correctly in the previous test;  (but I haven't tested edge cases)

   'baa' (levenshtein ; levw) 'asap'
┌─┬─┐
│3│3│
└─┴─┘

The RosettaCode examples are trivial in time & space, but, FWIW,

   'kitten' (levenshtein;levdist;levw)'sitting'
┌─┬─┬─┐
│3│3│3│
└─┴─┴─┘

   timespacex '''kitten'' levenshtein ''sitting'''
0.0001067 4992
   timespacex '''kitten'' levdist ''sitting'''    NB. levdist is correct in this case
1.31e_5 3072
   timespacex '''kitten'' levw ''sitting'''
2.17e_5 3200

I would have preferred to communicate directly with the author of the J entry,  but don't know how to find their name/s.  So I do apologise to them for this public message.

Cheers,

Mike
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to