The puzzle
I spent all yesterday trying to get colorizing for Python
and C to work. Several minor bugs were confusing matters,
but as I stamped out the simple bugs it began to dawn on me
that there is a tricky puzzle involved in handling languages
like C and PHP that use jEdit delegates. The more I think of
the puzzle, the more interesting it becomes.
Consider this (strange!) C code (contained at present in test.leo)::
@language c
#define WIPEOUT 0 /*
* comment line 1.
* comment line 2
*/
#include "eeprom.h"
What's so strange about this, you ask? Well, C preprocessor
directives stop at the end of the line, but wait, the C
comment starts in the preprocessor line and continues three
more lines! The jEdit pattern matcher handles this by
colorizing the line::
#define WIPEOUT 0 /*
with a so-called delegate. That is, the "top-level" pattern
matcher discovers the line above, and then "calls" the
c::cpp delegate to colorize it. Typically, the delegate just
colorizes everything blue (regardless of whether it looks
like a C construct. However the c::cpp delegate "extends"
the preprocessor line in order to colorize complete comments
in red. This colorizing is important, as it indicates that
comments are indeed comments, even if they start in a
preprocessor line!
Already, I think you can start to see problems in a
line-oriented environment. But before you jump to
conclusions about possible solutions to this puzzle,
consider this example::
#define /* a comment */ WIPEOUT 0
Thought experiments and other mental tools
Trust me, this is a worthy puzzle. I'm not going to go into
all the (foolish) ideas I tried yesterday. Suffice it to
say, no trivial solutions have any chance. It might be,
however, that some clever change in point of view will allow
some small (but probably tricky) code to handle all the
cases.
As a thought experiment, we could consider rewriting (all)
the jEdit language description files so that they don't use
delegates. We don't want to do that, but notice that if we
did rewrite the rules
**and left the meaning of the jEdit patterns unchanged**,
we could, in fact, use all the *old* jEdit pattern matches (including
in
threading_colorizer.py) unchanged.
OTOH, it's not at all clear how to get rid of delegates even
if we were free to do so!
Let me try to explain the essence of the problem. If Leo is
to use the language descriptions in the leo/modes directory,
the pattern matchers must conform to the assumptions
embodied in those language descriptions. But those
descriptions are stated in terms of general patterns, not
line-oriented patterns. Without delegates, the restarter
methods properly change the frame of reference from general
strings to lines. But in the presence of delegates, it
becomes much harder to encapsulate the state of what is, in
effect, a call stack of pointers into strings that can cross
line boundaries.
As stated above, we are free, as a thought experiment, to
consider rewriting the language description files. The
obvious idea is to recast all the colorizers in terms of
line-oriented rules. If we could do that, we might be able
to see the Aha that will enable the change of reference of
the **unchanged** pattern matchers to a line-oriented code.
But to repeat, it's not so clear how to do even this. Look
again at the original example::
@language c
#define WIPEOUT 0 /*
* comment line 1.
* comment line 2
*/
#include "eeprom.h"
The top-level pattern matcher discovers::
#define WIPEOUT 0 /*
What on earth are we to do with this? I suppose we could
invent a new kind of matcher that stops at /* or the
end-of-line, whichever comes first. This hack would work for
this *particular* example. The pattern matcher would color
in blue everything up to the /*. The top-level pattern for
comment-lines would then fire.
This hack is unsatisfying because it doesn't provide any
*general* insight about how to deal with delegates. Still,
it proves that a solution is feasible, provided we extend
threading_colorizer.py (and, if we care, the old qt
colorizer code) to handle the new kind of pattern matcher.
We have an existence proof that at least the c::cpp delegate
can be eliminated.
Or does it? Oh my. What state do we enter after handling the
C comment? In the first example, we must enter the default C
state. In the second example, we must re-enter the preprocessor
state. The only difference is whether the C comment
extends beyond the end of the line!
Maybe now you can see how tricky this all is.
That's all for now. I was hoping that writing this would
spur some new ideas. It may happen, but not yet :-)
Your comments and thoughts are always welcome.
Edward
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"leo-editor" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/leo-editor?hl=en
-~----------~----~----~----~------~----~------~--~---