Here's a fixed version of rxmatches:

rxmatches_jregex_=: 4 : 0
'p n'=. 2 {. boxopen x
regcomp p
m=. regmatch1 y
if. _1 = {.{.m do. i.0 1 2 return. end.
s=. 1 >. +/{.m
r=. ,: m
while. s <:#y do.
  if. _1 = {.{.m=. regmatch2 y;s do. break. end.
  s=. (s+1) >. +/ {.m
  r=. r, m
end.
if. #n do. n{"2 r end.
)

FYI,

-- 
Raul

On Wed, Apr 6, 2022 at 10:20 PM Rik Renich <[email protected]> wrote:
>
> There seems to be a bug in rxmatches.  I expect ('|$' rxmatches 'is') to
> have three matches, but the final one is omitted.  Likewise for an empty
> regex.  For comparison with perl:
>
>     cat | perl
>     $_= "is"; s/$/--/g; print "$_\n";
>     $_= "is"; s/|$/--/g; print "$_\n";
>     $_= "is"; s//--/g; print "$_\n";
>
>     is--
>     --i--s--
>     --i--s--
>
> Note that perl matches the end of the string for all 3 cases.
>
>     jc
>        s=: 'is'
>        (<'--') ('$' rxmatches s) rxmerge s
>     is--
>        (<'--') ('|$' rxmatches s) rxmerge s
>     --i--s
>        (<'--') ('' rxmatches s) rxmerge s
>     --i--s
>        exit''
>
> Note that I have used rxmerge to mimic the example given in perl.  However,
> the unexpected result comes from rxmatches.
>
> As these examples show, rxmatches is not compatible with perl for the
> second 2 cases.  Clearly the second case should match the end of the
> string, as one clause of the regex is "end of string."  The third case
> seems to be the same bug.
>
> If you visit https://www.pcre.org/original/doc/html/pcreapi.html and search
> for "tricky" you will find:
>
> Finding all the matches in a subject is tricky when the pattern can match
> an empty string. It is possible to emulate Perl's /g behaviour by first
> trying the match again at the same offset, with the PCRE_NOTEMPTY_ATSTART
> and PCRE_ANCHORED options, and then if that fails, advancing the starting
> offset and trying an ordinary match again. There is some code that
> demonstrates how to do this in the pcredemo sample program. In the most
> general case, you have to check to see if the newline convention recognizes
> CRLF as a newline, and if so, and the current character is CR followed by
> LF, advance the starting offset by two characters instead of one.
>
> I have tried pcredemo and it provides results consistent with perl.
>
> I have provided test scripts in both perl and ijs, along with a minimal
> test file.
>
> Thanks,
> Rik
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to