Recent revs in the js-importer branch provide a basic fix for #1481
<https://github.com/leo-editor/leo-editor/issues/1481>. This long
Engineering Notebook post will discuss this fix, it's limitations, further
problems and possible fixes for those problems.
As always, feel free to ignore. However, this post describes important
aspects of the code and its design. It may more than usual interest to
Leo's devs.
*The immediate problem*
Perfect import failed for reveal.js because *i.gen_lines* failed to
allocate lines properly to nodes. This caused lines to appear out of order.
i.gen_lines is part of the *pipeline*, defined in the base *Importer class*.
At present, the *JS_Importer* class (and most other importers) are
subclasses of the Importer class. Iirc, all importers use i.gen_lines
unchanged. The JS importer overrides i.starts_block, one of i.gen_lines's
helpers.
i.gen_lines uses a *parse state* to determine the start and end of *blocks*.
For Javascript, these blocks correspond (roughly) to classes and functions.
Alas, there are many ways to define a class or function in JS. JS is, by
far, the most difficult language to parse of all the languages handled by
Leo's importers.
Perfect import failed for reveal.js because the importer mistook a line
like if('function'){ as the start of a function. The present fix will
usually work (Leo can now import reveal.js), but not always, as I'll now
explain...
*Tokenizing Javascript*
You ask, how hard can it be to recognize strings like 'function' in JS? The
answer is "very very hard". Tokenizing JS depends on context. In
particular, it is difficult to determine whether a '/' character is the
"div" arithmetic operator or the start of a regular expression! JS is, by
far, the most difficult language to scan (tokenize) of all the languages
handled by Leo's importers.
*i.scan_line* updates the parse state after *carefully* tokenizing each
line. The JS importer method *js_i.scan_line* overrides i.scan_line to
handle '/' properly. As you can see, it's not pretty. The bug happened
because *js_i.starts_block* contained regex's that didn't distinguish the
"function" keyword from a string containing "function". The partial fix is
mostly a hack. js_i.starts_block now keeps track of whether function *looks*
like it is in a string. But this faux tokenizing could fail if quotes (or
"function") appear in a regex, as they very well might.
A proper fix would involve fully tokenizing each line, in *both *js_i.scan_line
and in js_i.starts_block. Therefore, we want a stand-alone javascript
tokenizer, written in python. Present revs include a copy of JsLex.py
<https://bitbucket.org/ned/jslex/src/default/jslex.py>, but it isn't hooked
up yet. The JsLex code contains a note that it doesn't handle non-ascii
characters properly, so it may need substantial revision. Happily, JsLex
does contain a suite of unit tests.
*Other problems*
The lines of reveal.js that caused the perfect import to fail are not
handled very well even after perfect import succeeds. Here are the
complications...
The JS importer must be completely immune from indentation, and it is. As a
direct consequence, perfect import tests ignore leading whitespace.
However, once lines are allocated to nodes, the importer tries to *adjust
*nodes
to make them as pleasing as possible. In particular, the importer removes
*common
leading whitespace* from all nodes. In addition, under special
circumstances, the importer may try to move one or more lines from the end
of one node to the start of the previous or following sibling node. The *post
pass* part of the pipeline handles all these adjustments. It would be
difficult/impossible to handle them in gen_lines.
At present, the JS importer only generates @others directives, never
section references. This could be changed, but imo using @others is much
better. However, @others does impose additional constraints on what the
post pass can do. It's time for an extended example. Here is the gist of
the code that caused perfect import to fail:
function startEmbeddedContent( element ) {
toArray( element.querySelectorAll( 'video, audio' ) ).forEach( function(
el ) {
if('function') { // The culprit
promise.catch( function() {
el.addEventListener( 'play', function() {
el.controls = false;
} );
} );
}
}
}
To have any chance of understanding what is going on, the post pass must be
disabled. Here are the results, without the post pass. Headlines are
preceded by ===:
=== function startEmbeddedContent
function startEmbeddedContent( element ) {
@others
}
=== toArray( element.querySelectorAll('video, audio')).forEach function
toArray( element.querySelectorAll('video, audio')).forEach( function(el)
{
if('function') {
@others
}
=== promise.catch function
promise.catch(function() {
@others
} );
} // <=====
=== el.addEventListener('play', function
el.addEventListener('play', function() {
el.controls = false;
} );
Now you can see why a post pass is desirable.
It's "obvious" that the last line of the node "promise.catch function"
belongs in the *previous* node. However, *such a move can not be done in
general!*
Only the post pass has any chance of having enough data to make this
adjustment. This adjustment can only be done because the node
"promise.catch function" is the *last* node under the range of the @others
in the node "toArray...". The adjustment would be invalid if the node
"promise.catch function" had any following siblings.
*Summary*
The present code now imports reveal.js without error. However, the latest
code is a hack. A proper fix entails carefully tokenizing lines in two
places. I plan to use JsLex to do this. JsLex will need work to handle
non-ascii characters properly. I'll do that as part of the fix for #1481.
The post pass attempts to reallocate lines to make the result more
palatable. The JS importer uses @others instead of section references. This
imposes constraints on possible adjustments. I'll attempt to improve the
post pass as another part of #1481. This may involve a rewrite/rethink.
This post has been part of the rethinking process.
The present problems with Leo's JS importer arise from well-known
infelicities in JS itself. The fixes to #1481 actually show the strengths
of the Leo's importer architecture. Fixes are straightforward and will be
confined only to the JS_Importer class.
Edward
--
You received this message because you are subscribed to the Google Groups
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/leo-editor/2995648b-a7ee-405c-b9e8-972a50da9d32%40googlegroups.com.