> From: Raul Miller <[email protected]>
> 
> On Fri, Jan 15, 2010 at 3:26 AM, Oleg Kobchenko wrote:
> >> From: Raul Miller 
> >> On Thu, Jan 14, 2010 at 6:39 PM, Oleg Kobchenko wrote:
> >> > Does it behave differently in Perl?
> >>
> >> Perl finds non-overlapping matches by default, but
> >> lets you restart the match at any given position so
> >> you can easily implement the overlapping matches
> >> case.
> >
> > What does it mean "restart the match"?
> >
> > Maybe you should do the same in J?
> 
> When matching in perl, you can have the
> regexp start at a specific index.  Since you
> know where the previous match began,
> you can start again at the following character.
> 
> To do this in J would require forming an
> explicit loop using rxmatch instead of
> rxmatches and extracting the appropriate
> substrings, all of which would be orders
> of magnitude slower than the perl approach.
> 
> >> > It looks like non-overlapping makes more sense.
> >>
> >> Both have uses.
> >
> > In any case it looks like not a bug.
> 
> So I posted this to the wrong list?

"Not a bug" means seeking a feature not intended
by design.

> But if this is not a bug in rxmatches, it is
> then a bug in the documentation for
> rxmatches, since rxmatches does not
> actually return "all matches".

We need to distinguish a Match from a regex Group
(parenthesized and optionally names sub-match).
Groups can be nested (wholly overlapped by the outer),
but not partially overlapped either.

Note: in regex
  (ab)|(cd)
the parens are redundant (unless you want to signal
which alternative triggered). Using "|" makes the two
(or more parts) mutually exclusive (or disjunctive), ie
it's either all whole one or whole other sub-expression
that is matched.

What rematches does is it finds a match one after another,
and when a match is found, it consumes the input and
the next match starts after the last consumed character.
So it is by definition non-overlapping.

I do not even know the concept of "overlapping" matches.
In general purpose parsing, each character can only be
in up to one match.

Is there any theory or references or code samples of
"overlapped" matches?


      
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to