This Engineering Notebook post discusses improving Leo's importers for 
difficult-to-parse languages such as c++ and javascript. Issue #3327 
<https://github.com/leo-editor/leo-editor/issues/3327> has become urgent 
now that I have begun to study codon! 

*tl;dr:* Aha: use *helper lines* to guide analysis. 

*Background*

Leo's importers have a long history. We are on something like the fifth 
iteration of their design. Each iteration has been a step forward, but 
Leo's c++ and javascript need more work.

Definitions of c++ functions or methods may be arbitrarily complex. For 
example, *processSource* in codon/codon/app/main.cpp starts this way:

std::unique_ptr<codon::Compiler> processSource(
    const std::vector<const char *> &args, bool standalone,
    std::function<bool()> pyExtension = [] { return false; }) {

Note how {return false} appears inside the parameter list!

*Aside*: I wonder whether codon generated this file! It's certainly 
difficult to read: everything is over-qualified.

*The problem*

The importer must split lines into nodes. Every line must appear in exactly 
one generated node. The bodies of the resulting nodes must *tile* the 
original file.

Handling the file line-by-line ensures that the generated nodes tile the 
file. However, a line-oriented approach complicates analysis. I'll omit 
most of the details.

Leo's importers tokenize the file so that strings and comments do not 
confuse the analysis. Alas, handling tokens creates *other* complications. 
What are we to do?

*Aha!* Let's use *helper lines* to simplify the analysis. We'll create the 
helper lines as follows:

- Start with the lines from the original file.
- Remove comments and strings.
- Remove curly brackets associated with 'if', 'for', and 'while' statements.
- Check the result to ensure that parens and brackets are properly nested.

The resulting lines will be much easier to analyze. The importer can assume 
that any remaining *top-level* curly brackets start the body of a class, 
function, or struct. The tiling problem remains challenging but tractable.

*Summary*

I plan to rewrite the c++ importer as suggested above. Helper lines will 
likely eliminate the need for the usual tokenizer and state stack.

Edward

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/leo-editor/a72a9dc4-0b1f-4be6-a463-34e68bafb755n%40googlegroups.com.

Reply via email to