Re: Regex expert needed??

Keith Culotta via 4D_Tech Sat, 24 Aug 2019 09:49:46 -0700

This version stands alone, and runs a little more efficiently.  It's not been 
tested in every way, but the results are encouraging.  Interesting problem.  I 
can't think of a way to do it without comparing every character combination.  
The new "Split string" command would speed part of this up if Collections could 
be used.


Keith - CDI


$str1:="kaslfkshjflsfhlksadlfbskdjfblgiutgqoiwuflkfhaskhjfgkajshgaefgasjdfgkajshgfakjhgfuyf"
$str2:="askjfhaskfhlhflefhljksfhdlkjdshfljkhflsehflifhlksjdfhljhsaljdiodjejkljfhlajksdhflajsdhfljdhflkajhdsflwuhfl"
ARRAY TEXT($aMatch;0)
$longest:=getCommon2 ($str1;$str2;->$aMatch)
ALERT($longest)  // = fhask



  // ----------------------------------------------------
  // Method: getCommon2
  // - 
  // INPUT1: Text - to compare characters
  // INPUT2: Text - to compare characters
  // INPUT3: Pointer - to text array of all common character strings
  // OUTPUT: Text - a longest match
  // ----------------------------------------------------
C_TEXT($str1;$str2;$str3;$1;$2;$0;$longestFound;$longer;$shorter;$sepChar2;$padStr;$sepChar;$word)
C_LONGINT($i;$j;$longLen;$loop;$m;$pos1;$pos2;$shortLen;$size;$newStart;$start)

$str1:=$1
$str2:=$2
$sepChar:="‡"  //Char(4)  // any character that would not be in the string
$longestFound:=""

If (Length($str1)>Length($str2))
        $longer:=$str1
        $shorter:=$str2
Else 
        $longer:=$str2
        $shorter:=$str1
End if 

$shortLen:=Length($shorter)
$padStr:=$sepChar*$shortLen
$longer:=$padStr+$longer
$longLen:=Length($longer)
$loop:=1
ARRAY TEXT($aMatch;0)

Repeat 
        Case of 
                : ($loop<=$shortLen)  // starting
                        $pos1:=$shortLen
                        $pos2:=$shortLen+$loop
                : ($loop>=$shortLen) & (($shortLen+$loop)<=$longLen)
                        $pos1:=$loop
                        $pos2:=$shortLen+$loop
                Else 
                        $pos1:=$loop
                        $pos2:=$longLen
        End case 

        $shorter:=$sepChar+$shorter  // slide str1 over str2
        $str3:=$sepChar*($pos2)

        For ($i;$pos1;$pos2)  // compare the vertical chars
                For ($j;$pos1;$i)
                        If ($shorter[[$j]]=$longer[[$j]])
                                $str3[[$j]]:=$shorter[[$j]]  // record a match
                        End if 
                End for 
        End for 

          // ------------------------- break out the matches
        $start:=1
        ARRAY TEXT($aTemp;0)
        $sepChar2:=$sepChar+$sepChar

        Repeat 
                $str3:=Replace string($str3;$sepChar2;$sepChar;*)  // remove 
all but a single sepChar
        Until (Position($sepChar2;$str3;*)=0)  // still need to check for the 
sepChar at the beginning or end?

        Repeat 
                $newStart:=Position($sepChar;$str3;$start;*)
                If ($newStart=0)
                        $word:=Substring($str3;$start)  // get the last word
                Else 
                        $word:=Substring($str3;$start;$newStart-$start)
                End if 
                APPEND TO ARRAY($aTemp;$word)
                $start:=$newStart+1
        Until ($newStart=0)
          // ------------------------- break out the matches

        $size:=Size of array($aTemp)
        For ($m;1;$size)
                If (Find in array($aMatch;$aTemp{$m})=-1)  // add the match if 
it's not already there
                        APPEND TO ARRAY($aMatch;$aTemp{$m})
                        If (Length($aTemp{$m})>Length($longestFound))  // 
remember the longest 
                                $longestFound:=$aTemp{$m}
                        End if 
                End if 
        End for 

        $loop:=$loop+1

Until ($pos1>$longLen)
  // | (Length($longestFound)>($longLen-$pos1))
  //     could  stop when the greatest found length exceeds the # of chars 
remaining to check
  //     if the complete array is not needed

COPY ARRAY($aMatch;$3->)
$0:=$longestFound



> On Aug 23, 2019, at 8:27 AM, Chip Scheide via 4D_Tech <[email protected]> 
> wrote:
> 
> 
> well...
> if there is a double space in one and not the other then the longest 
> duplicated string would be ' dog'.
> So that answer would be correct.
> 
> Also, over all, I would be interested in character duplication rather then 
> word duplication.
> 
> Thanks
> for all the input!!
> 
> Chip
> 
>> That’s rather what I thought, in which case it won’t work for Chip’
>> s original query.
>> 
>>> On 22 Aug 2019, at 21:12, Chip Scheide via 4D_Tech 
>>> <[email protected]> wrote:
>>> 
>>> Given 2 strings, 
>>> I want to find, and return, the longest substring which is the same in 
>>> both, regardless where in either string the longest substring starts.
>>> 
>>> ex: 
>>> 1- This is my dog
>>> 2- My dog does not have fleas
>>> longest common string is 'my dog’
>> 
>> 1 - This is my  dog                        // note double space after 
>> “my” 
>> 2 - My dog does not have fleas
>> 
>> longest common string is “ dog “ (and in Chip’s example, it’s 
>> actually “my dog “).
>> 
>> Jeremy
>> 
>>> On 23 Aug 2019, at 13:46, Keisuke Miyako via 4D_Tech 
>>> <[email protected]> wrote:
>>> 
>>> GET TEXT KEYWORDS breaks strings the same way as when you 
>>> double-click a word in a text editor.
>>> 
>>> spaces, tabs, etc. are boundaries,
>>> commas periods and apostrophes depend on the context.
>>> 
>>> e.g. (one word)
>>> 1,000,000 (one word)
>>> Macy's (one word)
>>> 
>>> http://userguide.icu-project.org/boundaryanalysis
>>> 
>>> 2019/08/23 21:39、Jeremy Roussak via 4D_Tech 
>>> <[email protected]<mailto:[email protected]>>のメール:
>>> What about double spaces?

**********************************************************************
4D Internet Users Group (4D iNUG)
Archive:  http://lists.4d.com/archives.html
Options: https://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[email protected]
**********************************************************************

Re: Regex expert needed??

Reply via email to