This Engineering Notebook post discusses improving Leo's importers for
difficult-to-parse languages such as c++ and javascript. Issue #3327
<https://github.com/leo-editor/leo-editor/issues/3327> has become urgent
now that I have begun to study codon!
*tl;dr:* Aha: use *helper lines* to guide analysis.
*Background*
Leo's importers have a long history. We are on something like the fifth
iteration of their design. Each iteration has been a step forward, but
Leo's c++ and javascript need more work.
Definitions of c++ functions or methods may be arbitrarily complex. For
example, *processSource* in codon/codon/app/main.cpp starts this way:
std::unique_ptr<codon::Compiler> processSource(
const std::vector<const char *> &args, bool standalone,
std::function<bool()> pyExtension = [] { return false; }) {
Note how {return false} appears inside the parameter list!
*Aside*: I wonder whether codon generated this file! It's certainly
difficult to read: everything is over-qualified.
*The problem*
The importer must split lines into nodes. Every line must appear in exactly
one generated node. The bodies of the resulting nodes must *tile* the
original file.
Handling the file line-by-line ensures that the generated nodes tile the
file. However, a line-oriented approach complicates analysis. I'll omit
most of the details.
Leo's importers tokenize the file so that strings and comments do not
confuse the analysis. Alas, handling tokens creates *other* complications.
What are we to do?
*Aha!* Let's use *helper lines* to simplify the analysis. We'll create the
helper lines as follows:
- Start with the lines from the original file.
- Remove comments and strings.
- Remove curly brackets associated with 'if', 'for', and 'while' statements.
- Check the result to ensure that parens and brackets are properly nested.
The resulting lines will be much easier to analyze. The importer can assume
that any remaining *top-level* curly brackets start the body of a class,
function, or struct. The tiling problem remains challenging but tractable.
*Summary*
I plan to rewrite the c++ importer as suggested above. Helper lines will
likely eliminate the need for the usual tokenizer and state stack.
Edward
--
You received this message because you are subscribed to the Google Groups
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/leo-editor/a72a9dc4-0b1f-4be6-a463-34e68bafb755n%40googlegroups.com.