[darcs-users] Suggestion: replace with regular expressions?

Albert Reiner Sun, 11 Dec 2005 09:30:15 -0800

In general, I like the idea of `darcs replace` a lot.  The problem I
seem to encounter every now and then is that of specifying the tokens
in a reasonable way: simply listing the allowed token characters
apparently works well for simple cases like (apparently) C or Haskell
or Fortran.


However, it is clearly not satisfactory for other types of text: e.g.,

- TeX, LaTeX etc., where a token might be, e.g., a backslash followed
  by characters from a given class; or where ``the na\"\i ve boy''
  should probably be seen as three tokens, not four;

- or Common Lisp (where any string can be used as a symbol name,
  though you might have to enclose it in |...|, but other characters
  cannot appear as the beginning of a symbol)

- or literate programs where, e.g., with noweb a token might be
  anything enclosed in << and >>, although single < or > characters in
  between are fine.

All of these could conveniently be expressed if one were able to use
real regular expressions for the token, rather than for every single
character of the token.

As was pointed out on this list earlier, the problem with those token
regular expressions is that it breaks invertability, which is needed
for darcs.

However, I think this can be remedied by introducing two regular
expressions, RE1 and RE2, that can be used for locating the position
and extent of the replacement before and after the replacement,
respectively.

So, upon replacing OLD by NEW,

- RE1 is used to find OLD;

- tentatively, OLD is replaced by NEW;

- the result of the replacements is checked against the combination of
  RE2 and NEW, which must accurately pinpoint the replacements: all
  instances of NEW that came from OLD, but no spurious ones

- if so, accept the replacement; otherwise, reject it

The inverted patch is obtained by simply switching OLD <-> NEW, and
RE1 <-> RE2.

In most cases I would expect both REi to be the same, so that RE2
might default to RE1; and if --token-chars is given, RE1 is trivially
constructed out of that.

As an example, consider the noweb chunk identifier syntax:  With

    RE1 = RE2 = <<[^<>]+([<>][^<>]+)+>>

the replacement of "<<foo>>" with "<<bar baz>>" always succeeds and is
invertible; OTOH, a replacement with "bar baz" would not succeed
because of RE2.

Comments?

Should one open a wishlist item?

Albert.


_______________________________________________
darcs-users mailing list
[email protected]
http://www.abridgegame.org/mailman/listinfo/darcs-users

[darcs-users] Suggestion: replace with regular expressions?

Reply via email to