I don't have one for overlapped matches but do have
one for non-overlapped matches, which may shed some
light on the former.

http://www.jsoftware.com/jwiki/Essays/Non-Overlapping_Substrings



----- Original Message -----
From: Oleg Kobchenko <[email protected]>
Date: Friday, January 15, 2010 13:28
Subject: Re: [Jbeta] bug in regexp support distributed with J
To: Beta forum <[email protected]>

> > From: Raul Miller <[email protected]>
> > 
> > On Fri, Jan 15, 2010 at 3:26 AM, Oleg Kobchenko wrote:
> > >> From: Raul Miller 
> > >> On Thu, Jan 14, 2010 at 6:39 PM, Oleg Kobchenko wrote:
> > >> > Does it behave differently in Perl?
> > >>
> > >> Perl finds non-overlapping matches by default, but
> > >> lets you restart the match at any given position so
> > >> you can easily implement the overlapping matches
> > >> case.
> > >
> > > What does it mean "restart the match"?
> > >
> > > Maybe you should do the same in J?
> > 
> > When matching in perl, you can have the
> > regexp start at a specific index.  Since you
> > know where the previous match began,
> > you can start again at the following character.
> > 
> > To do this in J would require forming an
> > explicit loop using rxmatch instead of
> > rxmatches and extracting the appropriate
> > substrings, all of which would be orders
> > of magnitude slower than the perl approach.
> > 
> > >> > It looks like non-overlapping makes more sense.
> > >>
> > >> Both have uses.
> > >
> > > In any case it looks like not a bug.
> > 
> > So I posted this to the wrong list?
> 
> "Not a bug" means seeking a feature not intended
> by design.
> 
> > But if this is not a bug in rxmatches, it is
> > then a bug in the documentation for
> > rxmatches, since rxmatches does not
> > actually return "all matches".
> 
> We need to distinguish a Match from a regex Group
> (parenthesized and optionally names sub-match).
> Groups can be nested (wholly overlapped by the outer),
> but not partially overlapped either.
> 
> Note: in regex
>   (ab)|(cd)
> the parens are redundant (unless you want to signal
> which alternative triggered). Using "|" makes the two
> (or more parts) mutually exclusive (or disjunctive), ie
> it's either all whole one or whole other sub-expression
> that is matched.
> 
> What rematches does is it finds a match one after another,
> and when a match is found, it consumes the input and
> the next match starts after the last consumed character.
> So it is by definition non-overlapping.
> 
> I do not even know the concept of "overlapping" matches.
> In general purpose parsing, each character can only be
> in up to one match.
> 
> Is there any theory or references or code samples of
> "overlapped" matches?
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to