Sergey Gromov wrote:
Thu, 13 Aug 2009 22:57:24 +0100, Stewart Gordon wrote:
Sergey Gromov wrote:
Well I think it's hard to create a regular expression engine flexible
enough to allow arbitrary highlighting.
I can't see how it can be at all complicated to find the beginning and
end of a C string or character literal.
This (Posix?) regexp
"(\\.|[^\\"])*"
works as I try (though not in the tiny subset of Posix regexps that N++
understands). But that's an aside - you don't need regexps at all to
get it working at this basic level, only a rudimentary concept of escape
sequences.
I think the best such engine
I've seen was Colorer by Igor Russkih, and even there I wasn't able to
express D's WYSIWYG or delimited strings. You need a real programming
language for that.
For WYSIWYG strings, all that's needed is a generic highlighter that
supports:
- the aforementioned string escapes
- multiple types of string literals distinguished by whether they
support string escapes, and not just delimiters
TextPad's syntax highlighting engine manages 2/3 of this without any
regexps (or anything to that effect). That said, I've just found that
it can do a little bit of what remains: I can make it do `...` but not
r"..." at the expense of distinguishing string and character literals.
But token-delimited strings are indeed more complex to deal with. (How
many people do we have putting them to practical use at the moment, for
that matter?)
Well, you can write a regexp to handle a simple C string. That is, if
your regexp is matched against the whole file, which is usually not the
case. Otherwise you'll have troubles with C string:
"foo\
bar"
or D string:
"foo
bar"
Then you want to highlight string escapes and probably format
specifiers. Therefore you need not simple regexps but hierarchies of
them, and also you need to know where *internals* of the string start
and end.
Then you have r"foo" which probably can be handled with regexps.
Then you have q"/foo/" where "/" can be anything. Still can be handled
by extended regexps, even though they won't be regular expressions in
scientific sense.
Then you have q"{foo}" where "{" and "}" can be any of ()[]<>{}.
Regexps cannot translate while substituting, so you must create regexps
for all possible parens.
Remember that the whole point of q{} strings was that they should NOT be
highlighted as strings!