Here's a fixed version of rxmatches:
rxmatches_jregex_=: 4 : 0
'p n'=. 2 {. boxopen x
regcomp p
m=. regmatch1 y
if. _1 = {.{.m do. i.0 1 2 return. end.
s=. 1 >. +/{.m
r=. ,: m
while. s <:#y do.
if. _1 = {.{.m=. regmatch2 y;s do. break. end.
s=. (s+1) >. +/ {.m
r=. r, m
end.
if. #n do. n{"2 r end.
)
FYI,
--
Raul
On Wed, Apr 6, 2022 at 10:20 PM Rik Renich <[email protected]> wrote:
>
> There seems to be a bug in rxmatches. I expect ('|$' rxmatches 'is') to
> have three matches, but the final one is omitted. Likewise for an empty
> regex. For comparison with perl:
>
> cat | perl
> $_= "is"; s/$/--/g; print "$_\n";
> $_= "is"; s/|$/--/g; print "$_\n";
> $_= "is"; s//--/g; print "$_\n";
>
> is--
> --i--s--
> --i--s--
>
> Note that perl matches the end of the string for all 3 cases.
>
> jc
> s=: 'is'
> (<'--') ('$' rxmatches s) rxmerge s
> is--
> (<'--') ('|$' rxmatches s) rxmerge s
> --i--s
> (<'--') ('' rxmatches s) rxmerge s
> --i--s
> exit''
>
> Note that I have used rxmerge to mimic the example given in perl. However,
> the unexpected result comes from rxmatches.
>
> As these examples show, rxmatches is not compatible with perl for the
> second 2 cases. Clearly the second case should match the end of the
> string, as one clause of the regex is "end of string." The third case
> seems to be the same bug.
>
> If you visit https://www.pcre.org/original/doc/html/pcreapi.html and search
> for "tricky" you will find:
>
> Finding all the matches in a subject is tricky when the pattern can match
> an empty string. It is possible to emulate Perl's /g behaviour by first
> trying the match again at the same offset, with the PCRE_NOTEMPTY_ATSTART
> and PCRE_ANCHORED options, and then if that fails, advancing the starting
> offset and trying an ordinary match again. There is some code that
> demonstrates how to do this in the pcredemo sample program. In the most
> general case, you have to check to see if the newline convention recognizes
> CRLF as a newline, and if so, and the current character is CR followed by
> LF, advance the starting offset by two characters instead of one.
>
> I have tried pcredemo and it provides results consistent with perl.
>
> I have provided test scripts in both perl and ijs, along with a minimal
> test file.
>
> Thanks,
> Rik
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm