Enter code here... This is an Engineering Notebook post. It will be of interest only to Leo's core devs. It discusses questions relating to #639: Javascript importer should use @others everywhere <https://github.com/leo-editor/leo-editor/issues/639>.
The goal is not in doubt. We all agree that the js importer should never generate section references. Two important questions immediately arise: 1. What, exactly, should the js importer generate? 2. How should the js importer generate the desired code? We can't answer two until we are completely clear about the answer to question 1. But as we shall see, the answer to question 2 probably follows from the details. *Background* People have differences of opinion...Every design or implementation choice carries a trade-off and numerous costs. There is seldom a right answer—Mozilla-central code of conduct There can be no better summary of this topic ;-) 1. Until recently, Javascript has had *no* syntax that organizes functions into classes. Instead, each organization chooses its own way (or ways!) of defining the *stand-ins* for classes. Yes, there are a number of standard patterns, but I would strongly prefer that the js importer "just work" regardless of pattern. 2. The question about how faithfully to recreate (round-trip) existing code is fraught with unexpected and vexing difficulties. We can not assume that the js sources are consistently indented. Some of the sources I have seen have had almost random indentation. Otoh, I think it is reasonable to assume that the js sources are not minimized. But even this isn't cast in stone. Automatically "beautifying" the code would, imo, be a throwback to old code. Leo's old importers attempted quasi parses of the text. They worked much like Vitalije's proposed generator-based code, including backward scans to find the start of functions. In contrast, Leo's new importers assign only entire lines to nodes. A character/token-oriented approach can not be ruled out entirely, but it would require a complete replacement of the js importer, something that I do not think is necessary. Finally (on this point), a *reasonable* import of poorly formatted cod almost certainly will require changing leading whitespace. Otherwise, nodes may contain common leading whitespace that any reasonable person would prefer not to see. 3. Javascript's regex syntax is a language abomination. *It is context-dependent whether something that looks like a string (or regex) is actually a string or regex.* It's difficult to tokenize text in the forward direction. It's probably impossible to do so in the backward direction in any reasonable amount of time. This casts strong doubt on whether any character-based approach can be sound. *The acid test for the js importer* Sadly, Vitalije's test files are the easiest kinds of files to handle. They contain no nested functions and a simple pattern match determines the class to which each function should belong. For these files *only* it is relatively straightforward to generate reasonable @others statements. Here is a much harder test, stripped to its essence from the main.js in a widely-used package. I forget which: require([ 'jquery', ], function( $, termjs, ){ var header = $("#header")[0]; function calculate_size() { var height = $(window).height() - header.offsetHeight; } page.show_header(); window.onresize = function() { terminal.socket.send(JSON.stringify([ "set_size", geom.rows, geom.cols, $(window).height(), $(window).width()]) ); }; window.terminal = terminal; }); Please note: 1. The body of the anonymous "require" function contains multiple interior (named) functions. 2. The body of the anonymous "require" function contains additional code *before, between and after* the named functions. For these and other reasons, I believe the most reasonable top-level node would be: require([ 'jquery', ], function( $, termjs, ){ @others }); // end require But now we come to a complication. The overall file could consist of several instances of this (or similar) code. In that case, the top-level node would almost certainly have to consist of just one @others node. All the other "classes" would then migrate down to direct child nodes. And there is an additional (very important!) complication in the original example. The code that comes before, between and after the named functions must be included in exactly one child node. For instance, child representing the calculate_size function must look like this: var header = $("#header")[0]; // head must be added function calculate_size() { var height = $(window).height() - header.offsetHeight; } That is, *everything* preceding this function definition *must* be included in the node. The only alternative would be to use a section reference. Or to put what precedes the first child into the top-level node. like this: require([ 'jquery', ], function( $, termjs, ){ var header = $("#header")[0]; @others }); // end require Imo, there is no reason to do this. The preceding code must be part of all other child nodes, so there is no reason to special case the first child node. In any case, the next child node, representing window.onresize, will contain: page.show_header(); window.onresize = function() { terminal.socket.send(JSON.stringify([ "set_size", geom.rows, geom.cols, $(window).height(), $(window).width()]) ); }; window.terminal = terminal; // Tail must be added. Again, we could special case the last child node and move this line into the top-level node: require([ 'jquery', ], function( $, termjs, ){ @others window.terminal = terminal; }); // end require Perhaps this special case is a bit more appealing than the first. *Summary* The requirement that no node contain multiple @others directives means that a function that defines multiple interior functions must split those functions into multiple direct children. The contents of these direct children must "cover" the entire body of the parent function. The code that comes before, between and after the named functions must be included in exactly one child node. As possible special cases, code preceding the first child or following the last child might migrate to the parent. Naturally, child nodes can (and often do) define multiple "grand-child" functions, so the process of splitting the file into nodes is naturally recursive. It's impossible to determine (in a single pass) the optimal way of splitting an outer function into child nodes. That depends on whether a function has zero, one or multiple internal children. I am convinced that Leo's existing js importer can be adapted to handle all these complications. It won't be easy, but that does not invalidate the approach. Easier ways will almost surely not be sound. Scanning backward founders on context-dependent tokens. All recent comments have been valuable. They have highlighted the limitations of using section references. Edward <https://github.com/leo-editor/leo-editor/issues/639> -- You received this message because you are subscribed to the Google Groups "leo-editor" group. To unsubscribe from this group and stop receiving emails from it, send an email to leo-editor+unsubscr...@googlegroups.com. To post to this group, send email to leo-editor@googlegroups.com. Visit this group at https://groups.google.com/group/leo-editor. For more options, visit https://groups.google.com/d/optout.