I've just gone and had another look. I checked and found that &.(-&iz)
DOES have an effect.
My nearly final version had (<.>:) /\. as the essential step. I
couldn't see why your levdist
outperformed this, until I removed the >:
So I've cribbed your device and used this inessential (or perhaps not!)
step, ending up with:
NB. Single loop J implementation of above Wiki function
NB. Doesn't need explicit v1 array
levwr=: 4 : 0
NB. 's t'=. (x;y) /: (#x),(#y)
's t'=. x ( ;/:,&#) y
v0 =. iz =. i.->: #t =. |. t
for_si. s do.
minsdcost =. (t ~: si) (>:@}:@] <. (+}.)) v0 NB. minima of
substition and deletion costs
NB. Henry Rich's insight was to use &.(-&iz) saving the expensive use
of (<. >:)
v0 =. <. /\.&.(-&iz) minsdcost,>:si_index NB. minima of
substitution, deletion & insertion costs
NB. I had v0 =. (<.>:) /\. si_index (,>:)~ (t ~: si) (>:@}:@] <.
(+ }.)) v0
end.
{.v0
)
("above" Wiki function is as in
https://www.codeproject.com/Articles/13525/Fast-memory-efficient-Levenshtein-algorithm-2
)
But this is so similar to yours as to be blatantly plagiaristic. Sorry
to have suggested levdist was flawed.
(I must have inadvertently removed that back-slash in <./\ in my copy of
levdist with some typing glitch.)
It's actually brilliant.
Cheers,
Mike
On 15/11/2022 19:28, Henry Rich wrote:
I wrote levdist. I don't remember much about it now. The fragment
<./\&.(-&iz)
indicates flawed thinking (the &.(-&iz) has no effect).
I think I was trying to do the same thing your version does.
Henry Rich
On 11/15/2022 1:57 PM, 'Michael Day' via General wrote:
Not sure if this is General or Programming - sorry if wrong.
The J entry for this Rosetta Code "task", at
https://rosettacode.org/wiki/Levenshtein_distance#J
offers 2 J functions;
the first, "levenshtein", is a "literal transcription of the
Wikipedia implementation", to be found at
https://en.wikipedia.org/wiki/Levenshtein_distance
The second more J-oriented "levdist" is quicker and uses less memory.
However, to my surprise, it has some error. Consider this simple
example:
'baa' (levenshtein ; levdist) 'asap'
┌─┬─┐
│3│2│
└─┴─┘
I haven't found where levdist fails, though I suspect it's something
to do with insertion.
For what it's worth, I've worked up a J-version of the second
function suggested in
the Wikipedia article. The relevant paragraph is headed "Iterative
with two matrix rows."
It's apparently based on this reference:
https://www.codeproject.com/Articles/13525/Fast-memory-efficient-Levenshtein-algorithm-2
Here's my function, which I call levw, the w for Wiki:
NB. Single loop J implementation of above function
NB. Doesn't need explicit v1 array
levw =: 4 : 0
'm n'=. #each 's t'=. (x;y) /: (#x),(#y)
v0 =. i. n+1
for_i. i.m do.
scost =. (}:v0) + (i{s) ~: t{~ jl =. i.n NB. substitution cost
minsdcost=. scost <. >: v0{~ jl+1 NB. minimum of
substition cost and deletion cost
icost =. >:i NB. initial
insertion cost for row i (I think!)
v0 =. (<.>:) /\.&.: |. icost, minsdcost NB. minima of
insertion, deletion & substitution cost
end.
{:v0
)
(The Wiki function has a double loop, on i.m and i.n - the inner
loop is replaced by the (<.>:) /\.&.: |. stuff )
It performs correctly in the previous test; (but I haven't tested
edge cases)
'baa' (levenshtein ; levw) 'asap'
┌─┬─┐
│3│3│
└─┴─┘
The RosettaCode examples are trivial in time & space, but, FWIW,
'kitten' (levenshtein;levdist;levw)'sitting'
┌─┬─┬─┐
│3│3│3│
└─┴─┴─┘
timespacex '''kitten'' levenshtein ''sitting'''
0.0001067 4992
timespacex '''kitten'' levdist ''sitting''' NB. levdist is
correct in this case
1.31e_5 3072
timespacex '''kitten'' levw ''sitting'''
2.17e_5 3200
I would have preferred to communicate directly with the author of the
J entry, but don't
know how to find their name/s. So I do apologise to them for this
public message.
Cheers,
Mike
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm