I've just gone and had another look.  I checked and found that &.(-&iz)  DOES have an effect. My nearly final version had   (<.>:) /\.  as the essential step.  I couldn't see why your levdist
outperformed this,  until I removed the >:

So I've cribbed your device and used this inessential (or perhaps not!) step, ending up with:

NB. Single loop J implementation of above Wiki function
NB. Doesn't need explicit v1 array
levwr=: 4 : 0
NB. 's t'=. (x;y) /: (#x),(#y)
's t'=. x ( ;/:,&#) y
v0   =. iz =. i.->: #t =. |. t
for_si. s do.
   minsdcost =. (t ~: si) (>:@}:@] <. (+}.)) v0         NB. minima of substition and deletion costs NB. Henry Rich's insight was to use  &.(-&iz) saving the expensive use of (<. >:)    v0        =. <. /\.&.(-&iz) minsdcost,>:si_index    NB. minima of substitution, deletion & insertion costs NB. I had v0        =. (<.>:) /\. si_index (,>:)~ (t ~: si) (>:@}:@] <. (+ }.)) v0
end.
{.v0
)

("above" Wiki function is as in
https://www.codeproject.com/Articles/13525/Fast-memory-efficient-Levenshtein-algorithm-2 )

But this is so similar to yours as to be blatantly plagiaristic. Sorry to have suggested levdist was flawed. (I must have inadvertently removed that back-slash in <./\ in my copy of levdist with some typing glitch.)

It's actually brilliant.

Cheers,

Mike


On 15/11/2022 19:28, Henry Rich wrote:
I wrote levdist.  I don't remember much about it now.  The fragment

<./\&.(-&iz)

indicates flawed thinking (the &.(-&iz) has no effect).

I think I was trying to do the same thing your version does.

Henry Rich

On 11/15/2022 1:57 PM, 'Michael Day' via General wrote:
Not sure if this is General or Programming - sorry if wrong.

The J entry for this Rosetta Code "task",  at
https://rosettacode.org/wiki/Levenshtein_distance#J
offers 2 J functions;
the first,  "levenshtein",  is a "literal transcription of the Wikipedia implementation", to be found at
https://en.wikipedia.org/wiki/Levenshtein_distance

The second more J-oriented "levdist"  is quicker and uses less memory.
However,  to my surprise,  it has some error.  Consider this simple example:

   'baa' (levenshtein ; levdist) 'asap'
┌─┬─┐
│3│2│
└─┴─┘

I haven't found where levdist fails,  though I suspect it's something to do with insertion.

For what it's worth,  I've worked up a J-version of the second function suggested in the Wikipedia article.  The relevant paragraph is headed "Iterative with two matrix rows."
It's apparently based on this reference:
https://www.codeproject.com/Articles/13525/Fast-memory-efficient-Levenshtein-algorithm-2

Here's my function,  which I call levw,  the w for Wiki:

NB. Single loop J implementation of above function
NB. Doesn't need explicit v1 array
levw =: 4 : 0
'm n'=. #each 's t'=. (x;y) /: (#x),(#y)
v0   =. i. n+1
for_i. i.m do.
   scost    =. (}:v0) + (i{s) ~: t{~ jl =. i.n   NB. substitution cost
   minsdcost=. scost <. >: v0{~ jl+1             NB. minimum of substition cost and deletion cost    icost    =. >:i                               NB. initial insertion cost for row i (I think!)    v0       =. (<.>:) /\.&.: |. icost, minsdcost NB. minima of insertion, deletion & substitution cost
end.
{:v0
)

(The Wiki function has a double loop,  on i.m and i.n - the inner loop is replaced by the (<.>:) /\.&.: |.  stuff )

It performs correctly in the previous test;  (but I haven't tested edge cases)

   'baa' (levenshtein ; levw) 'asap'
┌─┬─┐
│3│3│
└─┴─┘

The RosettaCode examples are trivial in time & space, but, FWIW,

   'kitten' (levenshtein;levdist;levw)'sitting'
┌─┬─┬─┐
│3│3│3│
└─┴─┴─┘

   timespacex '''kitten'' levenshtein ''sitting'''
0.0001067 4992
   timespacex '''kitten'' levdist ''sitting'''    NB. levdist is correct in this case
1.31e_5 3072
   timespacex '''kitten'' levw ''sitting'''
2.17e_5 3200

I would have preferred to communicate directly with the author of the J entry,  but don't know how to find their name/s.  So I do apologise to them for this public message.

Cheers,

Mike
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to