I don't have one for overlapped matches but do have one for non-overlapped matches, which may shed some light on the former.
http://www.jsoftware.com/jwiki/Essays/Non-Overlapping_Substrings ----- Original Message ----- From: Oleg Kobchenko <[email protected]> Date: Friday, January 15, 2010 13:28 Subject: Re: [Jbeta] bug in regexp support distributed with J To: Beta forum <[email protected]> > > From: Raul Miller <[email protected]> > > > > On Fri, Jan 15, 2010 at 3:26 AM, Oleg Kobchenko wrote: > > >> From: Raul Miller > > >> On Thu, Jan 14, 2010 at 6:39 PM, Oleg Kobchenko wrote: > > >> > Does it behave differently in Perl? > > >> > > >> Perl finds non-overlapping matches by default, but > > >> lets you restart the match at any given position so > > >> you can easily implement the overlapping matches > > >> case. > > > > > > What does it mean "restart the match"? > > > > > > Maybe you should do the same in J? > > > > When matching in perl, you can have the > > regexp start at a specific index. Since you > > know where the previous match began, > > you can start again at the following character. > > > > To do this in J would require forming an > > explicit loop using rxmatch instead of > > rxmatches and extracting the appropriate > > substrings, all of which would be orders > > of magnitude slower than the perl approach. > > > > >> > It looks like non-overlapping makes more sense. > > >> > > >> Both have uses. > > > > > > In any case it looks like not a bug. > > > > So I posted this to the wrong list? > > "Not a bug" means seeking a feature not intended > by design. > > > But if this is not a bug in rxmatches, it is > > then a bug in the documentation for > > rxmatches, since rxmatches does not > > actually return "all matches". > > We need to distinguish a Match from a regex Group > (parenthesized and optionally names sub-match). > Groups can be nested (wholly overlapped by the outer), > but not partially overlapped either. > > Note: in regex > (ab)|(cd) > the parens are redundant (unless you want to signal > which alternative triggered). Using "|" makes the two > (or more parts) mutually exclusive (or disjunctive), ie > it's either all whole one or whole other sub-expression > that is matched. > > What rematches does is it finds a match one after another, > and when a match is found, it consumes the input and > the next match starts after the last consumed character. > So it is by definition non-overlapping. > > I do not even know the concept of "overlapping" matches. > In general purpose parsing, each character can only be > in up to one match. > > Is there any theory or references or code samples of > "overlapped" matches? ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
