Hi Raul,

Thanks for the quick work.  It does indeed fix the last two examples, but
it creates an extra match in the first one.  The pcre documentation did
warn us it was tricky!  Hopefully the test scripts I recently sent will
prove useful.

Thanks again,
Rik

P.S.  Raul, it's good to hear from you after such a long time.

On Thu, Apr 7, 2022 at 11:44 PM Raul Miller <[email protected]> wrote:

> Here's a fixed version of rxmatches:
>
> rxmatches_jregex_=: 4 : 0
> 'p n'=. 2 {. boxopen x
> regcomp p
> m=. regmatch1 y
> if. _1 = {.{.m do. i.0 1 2 return. end.
> s=. 1 >. +/{.m
> r=. ,: m
> while. s <:#y do.
>   if. _1 = {.{.m=. regmatch2 y;s do. break. end.
>   s=. (s+1) >. +/ {.m
>   r=. r, m
> end.
> if. #n do. n{"2 r end.
> )
>
> FYI,
>
> --
> Raul
>
> On Wed, Apr 6, 2022 at 10:20 PM Rik Renich <[email protected]> wrote:
> >
> > There seems to be a bug in rxmatches.  I expect ('|$' rxmatches 'is') to
> > have three matches, but the final one is omitted.  Likewise for an empty
> > regex.  For comparison with perl:
> >
> >     cat | perl
> >     $_= "is"; s/$/--/g; print "$_\n";
> >     $_= "is"; s/|$/--/g; print "$_\n";
> >     $_= "is"; s//--/g; print "$_\n";
> >
> >     is--
> >     --i--s--
> >     --i--s--
> >
> > Note that perl matches the end of the string for all 3 cases.
> >
> >     jc
> >        s=: 'is'
> >        (<'--') ('$' rxmatches s) rxmerge s
> >     is--
> >        (<'--') ('|$' rxmatches s) rxmerge s
> >     --i--s
> >        (<'--') ('' rxmatches s) rxmerge s
> >     --i--s
> >        exit''
> >
> > Note that I have used rxmerge to mimic the example given in perl.
> However,
> > the unexpected result comes from rxmatches.
> >
> > As these examples show, rxmatches is not compatible with perl for the
> > second 2 cases.  Clearly the second case should match the end of the
> > string, as one clause of the regex is "end of string."  The third case
> > seems to be the same bug.
> >
> > If you visit https://www.pcre.org/original/doc/html/pcreapi.html and
> search
> > for "tricky" you will find:
> >
> > Finding all the matches in a subject is tricky when the pattern can match
> > an empty string. It is possible to emulate Perl's /g behaviour by first
> > trying the match again at the same offset, with the PCRE_NOTEMPTY_ATSTART
> > and PCRE_ANCHORED options, and then if that fails, advancing the starting
> > offset and trying an ordinary match again. There is some code that
> > demonstrates how to do this in the pcredemo sample program. In the most
> > general case, you have to check to see if the newline convention
> recognizes
> > CRLF as a newline, and if so, and the current character is CR followed by
> > LF, advance the starting offset by two characters instead of one.
> >
> > I have tried pcredemo and it provides results consistent with perl.
> >
> > I have provided test scripts in both perl and ijs, along with a minimal
> > test file.
> >
> > Thanks,
> > Rik
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to