Hi Raul, That is working well.
Thanks, Rik On Fri, Apr 8, 2022 at 11:04 AM Raul Miller <[email protected]> wrote: > Here's a fix for the double match from '$' rxmatches 'is' > > rxmatches_jregex_=: 4 : 0 > 'p n'=. 2 {. boxopen x > regcomp p > m=. regmatch1 y > if. _1 = {.{.m do. i.0 1 2 return. end. > s=. +/0 1>.{.m > r=. ,: m > while. s <:#y do. > if. _1 = {.{.m=. regmatch2 y;s do. break. end. > s=. (s+1) >. +/ {.m > r=. r, m > end. > if. #n do. n{"2 r end. > ) > > In other words: > > My first fix allowed us to search the empty position after the last > character in the string. > > My second fix adopts a mild variation of the "next position" loop > mechanism for dealing with the necessary shift from the first match > (regmatch1) and the subsequent matches (regmatch2). > > The general problem with matching the empty string is that there's an > infinite number of them. So, technically, there are actually two such > matches at the end of the string (and there's three, etc.). > > But, for a minimal result what we do is enforce mechanisms which > always advance at least one character beyond a match of an empty > substring. > > I hope this helps, > > -- > Raul > > On Fri, Apr 8, 2022 at 10:28 AM Rik Renich <[email protected]> wrote: > > > > Hi Raul, > > > > Thanks for the quick work. It does indeed fix the last two examples, but > > it creates an extra match in the first one. The pcre documentation did > > warn us it was tricky! Hopefully the test scripts I recently sent will > > prove useful. > > > > Thanks again, > > Rik > > > > P.S. Raul, it's good to hear from you after such a long time. > > > > On Thu, Apr 7, 2022 at 11:44 PM Raul Miller <[email protected]> > wrote: > > > > > Here's a fixed version of rxmatches: > > > > > > rxmatches_jregex_=: 4 : 0 > > > 'p n'=. 2 {. boxopen x > > > regcomp p > > > m=. regmatch1 y > > > if. _1 = {.{.m do. i.0 1 2 return. end. > > > s=. 1 >. +/{.m > > > r=. ,: m > > > while. s <:#y do. > > > if. _1 = {.{.m=. regmatch2 y;s do. break. end. > > > s=. (s+1) >. +/ {.m > > > r=. r, m > > > end. > > > if. #n do. n{"2 r end. > > > ) > > > > > > FYI, > > > > > > -- > > > Raul > > > > > > On Wed, Apr 6, 2022 at 10:20 PM Rik Renich <[email protected]> wrote: > > > > > > > > There seems to be a bug in rxmatches. I expect ('|$' rxmatches > 'is') to > > > > have three matches, but the final one is omitted. Likewise for an > empty > > > > regex. For comparison with perl: > > > > > > > > cat | perl > > > > $_= "is"; s/$/--/g; print "$_\n"; > > > > $_= "is"; s/|$/--/g; print "$_\n"; > > > > $_= "is"; s//--/g; print "$_\n"; > > > > > > > > is-- > > > > --i--s-- > > > > --i--s-- > > > > > > > > Note that perl matches the end of the string for all 3 cases. > > > > > > > > jc > > > > s=: 'is' > > > > (<'--') ('$' rxmatches s) rxmerge s > > > > is-- > > > > (<'--') ('|$' rxmatches s) rxmerge s > > > > --i--s > > > > (<'--') ('' rxmatches s) rxmerge s > > > > --i--s > > > > exit'' > > > > > > > > Note that I have used rxmerge to mimic the example given in perl. > > > However, > > > > the unexpected result comes from rxmatches. > > > > > > > > As these examples show, rxmatches is not compatible with perl for the > > > > second 2 cases. Clearly the second case should match the end of the > > > > string, as one clause of the regex is "end of string." The third > case > > > > seems to be the same bug. > > > > > > > > If you visit https://www.pcre.org/original/doc/html/pcreapi.html and > > > search > > > > for "tricky" you will find: > > > > > > > > Finding all the matches in a subject is tricky when the pattern can > match > > > > an empty string. It is possible to emulate Perl's /g behaviour by > first > > > > trying the match again at the same offset, with the > PCRE_NOTEMPTY_ATSTART > > > > and PCRE_ANCHORED options, and then if that fails, advancing the > starting > > > > offset and trying an ordinary match again. There is some code that > > > > demonstrates how to do this in the pcredemo sample program. In the > most > > > > general case, you have to check to see if the newline convention > > > recognizes > > > > CRLF as a newline, and if so, and the current character is CR > followed by > > > > LF, advance the starting offset by two characters instead of one. > > > > > > > > I have tried pcredemo and it provides results consistent with perl. > > > > > > > > I have provided test scripts in both perl and ijs, along with a > minimal > > > > test file. > > > > > > > > Thanks, > > > > Rik > > > > > ---------------------------------------------------------------------- > > > > For information about J forums see > http://www.jsoftware.com/forums.htm > > > ---------------------------------------------------------------------- > > > For information about J forums see http://www.jsoftware.com/forums.htm > > > > > ---------------------------------------------------------------------- > > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
