This version stands alone, and runs a little more efficiently. It's not been
tested in every way, but the results are encouraging. Interesting problem. I
can't think of a way to do it without comparing every character combination.
The new "Split string" command would speed part of this up if Collections could
be used.
Keith - CDI
$str1:="kaslfkshjflsfhlksadlfbskdjfblgiutgqoiwuflkfhaskhjfgkajshgaefgasjdfgkajshgfakjhgfuyf"
$str2:="askjfhaskfhlhflefhljksfhdlkjdshfljkhflsehflifhlksjdfhljhsaljdiodjejkljfhlajksdhflajsdhfljdhflkajhdsflwuhfl"
ARRAY TEXT($aMatch;0)
$longest:=getCommon2 ($str1;$str2;->$aMatch)
ALERT($longest) // = fhask
// ----------------------------------------------------
// Method: getCommon2
// -
// INPUT1: Text - to compare characters
// INPUT2: Text - to compare characters
// INPUT3: Pointer - to text array of all common character strings
// OUTPUT: Text - a longest match
// ----------------------------------------------------
C_TEXT($str1;$str2;$str3;$1;$2;$0;$longestFound;$longer;$shorter;$sepChar2;$padStr;$sepChar;$word)
C_LONGINT($i;$j;$longLen;$loop;$m;$pos1;$pos2;$shortLen;$size;$newStart;$start)
$str1:=$1
$str2:=$2
$sepChar:="‡" //Char(4) // any character that would not be in the string
$longestFound:=""
If (Length($str1)>Length($str2))
$longer:=$str1
$shorter:=$str2
Else
$longer:=$str2
$shorter:=$str1
End if
$shortLen:=Length($shorter)
$padStr:=$sepChar*$shortLen
$longer:=$padStr+$longer
$longLen:=Length($longer)
$loop:=1
ARRAY TEXT($aMatch;0)
Repeat
Case of
: ($loop<=$shortLen) // starting
$pos1:=$shortLen
$pos2:=$shortLen+$loop
: ($loop>=$shortLen) & (($shortLen+$loop)<=$longLen)
$pos1:=$loop
$pos2:=$shortLen+$loop
Else
$pos1:=$loop
$pos2:=$longLen
End case
$shorter:=$sepChar+$shorter // slide str1 over str2
$str3:=$sepChar*($pos2)
For ($i;$pos1;$pos2) // compare the vertical chars
For ($j;$pos1;$i)
If ($shorter[[$j]]=$longer[[$j]])
$str3[[$j]]:=$shorter[[$j]] // record a match
End if
End for
End for
// ------------------------- break out the matches
$start:=1
ARRAY TEXT($aTemp;0)
$sepChar2:=$sepChar+$sepChar
Repeat
$str3:=Replace string($str3;$sepChar2;$sepChar;*) // remove
all but a single sepChar
Until (Position($sepChar2;$str3;*)=0) // still need to check for the
sepChar at the beginning or end?
Repeat
$newStart:=Position($sepChar;$str3;$start;*)
If ($newStart=0)
$word:=Substring($str3;$start) // get the last word
Else
$word:=Substring($str3;$start;$newStart-$start)
End if
APPEND TO ARRAY($aTemp;$word)
$start:=$newStart+1
Until ($newStart=0)
// ------------------------- break out the matches
$size:=Size of array($aTemp)
For ($m;1;$size)
If (Find in array($aMatch;$aTemp{$m})=-1) // add the match if
it's not already there
APPEND TO ARRAY($aMatch;$aTemp{$m})
If (Length($aTemp{$m})>Length($longestFound)) //
remember the longest
$longestFound:=$aTemp{$m}
End if
End if
End for
$loop:=$loop+1
Until ($pos1>$longLen)
// | (Length($longestFound)>($longLen-$pos1))
// could stop when the greatest found length exceeds the # of chars
remaining to check
// if the complete array is not needed
COPY ARRAY($aMatch;$3->)
$0:=$longestFound
> On Aug 23, 2019, at 8:27 AM, Chip Scheide via 4D_Tech <[email protected]>
> wrote:
>
>
> well...
> if there is a double space in one and not the other then the longest
> duplicated string would be ' dog'.
> So that answer would be correct.
>
> Also, over all, I would be interested in character duplication rather then
> word duplication.
>
> Thanks
> for all the input!!
>
> Chip
>
>> That’s rather what I thought, in which case it won’t work for Chip’
>> s original query.
>>
>>> On 22 Aug 2019, at 21:12, Chip Scheide via 4D_Tech
>>> <[email protected]> wrote:
>>>
>>> Given 2 strings,
>>> I want to find, and return, the longest substring which is the same in
>>> both, regardless where in either string the longest substring starts.
>>>
>>> ex:
>>> 1- This is my dog
>>> 2- My dog does not have fleas
>>> longest common string is 'my dog’
>>
>> 1 - This is my dog // note double space after
>> “my”
>> 2 - My dog does not have fleas
>>
>> longest common string is “ dog “ (and in Chip’s example, it’s
>> actually “my dog “).
>>
>> Jeremy
>>
>>> On 23 Aug 2019, at 13:46, Keisuke Miyako via 4D_Tech
>>> <[email protected]> wrote:
>>>
>>> GET TEXT KEYWORDS breaks strings the same way as when you
>>> double-click a word in a text editor.
>>>
>>> spaces, tabs, etc. are boundaries,
>>> commas periods and apostrophes depend on the context.
>>>
>>> e.g. (one word)
>>> 1,000,000 (one word)
>>> Macy's (one word)
>>>
>>> http://userguide.icu-project.org/boundaryanalysis
>>>
>>> 2019/08/23 21:39、Jeremy Roussak via 4D_Tech
>>> <[email protected]<mailto:[email protected]>>のメール:
>>> What about double spaces?
**********************************************************************
4D Internet Users Group (4D iNUG)
Archive: http://lists.4d.com/archives.html
Options: https://lists.4d.com/mailman/options/4d_tech
Unsub: mailto:[email protected]
**********************************************************************