ENB: Code generation for javascript

Edward K. Ream Fri, 22 Dec 2017 12:50:04 -0800

Enter code here...

This is an Engineering Notebook post. It will be of interest only to Leo's 
core devs.  It discusses questions relating to #639: Javascript importer 
should use @others everywhere 
<https://github.com/leo-editor/leo-editor/issues/639>.


The goal is not in doubt. We all agree that the js importer should never 
generate section references.  Two important questions immediately arise:

1. What, exactly, should the js importer generate?
2. How should the js importer generate the desired code?

We can't answer two until we are completely clear about the answer to 
question 1.  But as we shall see, the answer to question 2 probably follows 
from the details.


*Background*
People have differences of opinion...Every design or implementation choice 
carries a trade-off and numerous costs. There is seldom a right 
answer—Mozilla-central code of conduct

There can be no better summary of this topic ;-)

1. Until recently, Javascript has had *no* syntax that organizes functions 
into classes.  Instead, each organization chooses its own way (or ways!) of 
defining the *stand-ins* for classes. Yes, there are a number of standard 
patterns, but I would strongly prefer that the js importer "just work" 
regardless of pattern.

2. The question about how faithfully to recreate (round-trip) existing code 
is fraught with unexpected and vexing difficulties.

We can not assume that the js sources are consistently indented. Some of 
the sources I have seen have had almost random indentation.  Otoh, I think 
it is reasonable to assume that the js sources are not minimized. But even 
this isn't cast in stone.

Automatically "beautifying" the code would, imo, be a throwback to old 
code. Leo's old importers attempted quasi parses of the text.  They worked 
much like Vitalije's proposed generator-based code, including backward 
scans to find the start of functions.

In contrast, Leo's new importers assign only entire lines to nodes.  A 
character/token-oriented approach can not be ruled out entirely, but it 
would require a complete replacement of the js importer, something that I 
do not think is necessary.

Finally (on this point), a *reasonable* import of poorly formatted cod 
almost certainly will require changing leading whitespace. Otherwise, nodes 
may contain common leading whitespace that any reasonable person would 
prefer not to see. 

3. Javascript's regex syntax is a language abomination.  *It is 
context-dependent whether something that looks like a string (or regex) is 
actually a string or regex.* It's difficult to tokenize text in the forward 
direction.  It's probably impossible to do so in the backward direction in 
any reasonable amount of time.  This casts strong doubt on whether any 
character-based approach can be sound.

*The acid test for the js importer*

Sadly, Vitalije's test files are the easiest kinds of files to handle. They 
contain no nested functions and a simple pattern match determines the class 
to which each function should belong.  For these files *only* it is 
relatively straightforward to generate reasonable @others statements.

Here is a much harder test, stripped to its essence from the main.js in a 
widely-used package. I forget which:

require([
    'jquery',
], function(
        $, 
        termjs,
){
    var header = $("#header")[0];
    function calculate_size() {
        var height = $(window).height() - header.offsetHeight;
    }
    page.show_header();
    window.onresize = function() { 
      terminal.socket.send(JSON.stringify([
            "set_size", geom.rows, geom.cols,
            $(window).height(), $(window).width()])
        );
    };
    window.terminal = terminal;
});

Please note:

1. The body of the anonymous "require" function contains multiple interior 
(named) functions.
2. The body of the anonymous "require" function contains additional code 
*before, 
between and after* the named functions.

For these and other reasons, I believe the most reasonable top-level node 
would be:

require([
    'jquery',
], function(
        $, 
        termjs,
){
    @others
}); // end require

But now we come to a complication.  The overall file could consist of 
several instances of this (or similar) code.  In that case, the top-level 
node would almost certainly have to consist of just one @others node.  All 
the other "classes" would then migrate down to direct child nodes.

And there is an additional (very important!) complication in the original 
example.  The code that comes before, between and after the named functions 
must be included in exactly one child node.  For instance, child 
representing the calculate_size function must look like this:

var header = $("#header")[0];
    // head must be added
function calculate_size() {
    var height = $(window).height() - header.offsetHeight;
}

That is, *everything* preceding this function definition *must* be included 
in the node.  The only alternative would be to use a section reference. Or 
to put what precedes the first child into the top-level node.  like this:

require([
    'jquery',
], function(
        $, 
        termjs,
){
    var header = $("#header")[0];
    @others
}); // end require

Imo, there is no reason to do this.  The preceding code must be part of all 
other child nodes, so there is no reason to special case the first child 
node.

In any case, the next child node, representing  window.onresize, will 
contain:

page.show_header();
window.onresize = function() { 
  terminal.socket.send(JSON.stringify([
        "set_size", geom.rows, geom.cols,
        $(window).height(), $(window).width()])
    );
};
window.terminal = terminal;
    // Tail must be added.

Again, we could special case the last child node and move this line into 
the top-level node:

require([
    'jquery',
], function(
        $, 
        termjs,
){
    @others
    window.terminal = terminal;
}); // end require

Perhaps this special case is a bit more appealing than the first.

*Summary*

The requirement that no node contain multiple @others directives means that 
a function that defines multiple interior functions must split those 
functions into multiple direct children.

The contents of these direct children must "cover" the entire body of the 
parent function.  The code that comes before, between and after the named 
functions must be included in exactly one child node. As possible special 
cases, code preceding the first child or following the last child might 
migrate to the parent.

Naturally, child nodes can (and often do) define multiple "grand-child" 
functions, so the process of splitting the file into nodes is naturally 
recursive.

It's impossible to determine (in a single pass) the optimal way of 
splitting an outer function into child nodes.  That depends on whether a 
function has zero, one or multiple internal children.

I am convinced that Leo's existing js importer can be adapted to handle all 
these complications. It won't be easy, but that does not invalidate the 
approach. Easier ways will almost surely not be sound. Scanning backward 
founders on context-dependent tokens.

All recent comments have been valuable. They have highlighted the 
limitations of using section references.

Edward
<https://github.com/leo-editor/leo-editor/issues/639>

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to leo-editor+unsubscr...@googlegroups.com.
To post to this group, send email to leo-editor@googlegroups.com.
Visit this group at https://groups.google.com/group/leo-editor.
For more options, visit https://groups.google.com/d/optout.

ENB: Code generation for javascript

Reply via email to