The next phase of the project is to complete the code that splits long 
lines and joins short lines. I want this code to be as simple as possible. 
The crucial split/join "snippets" should advertise the virtues of the TOG 
class.

Just as with the code that handles slices, I have only a vague idea of what 
the final split/join code will look like.This ENB notebook post attempts to 
clarify issues relating to the split/join logic. As always, feel free to 
ignore it.

*Background*

At present, the code that splits lines is *entirely* token based. This 
*usually* works well enough, but the token-based code relies on an open 
parenthesis (token) already being present in the statement. If this open 
paren exists, the long line may safely be split anywhere between tokens. 
Most long lines involve function call statements (ast.Call nodes), and such 
statements do indeed contain the needed open paren. Alas, other python 
statements, including returns and assignments, may not already have the 
needed open paren. The split could must know where to insert the required 
pair of parens.

In short, my working assumption is that having access to the parse tree is 
essential (very helpful) in the split logic. Ditto for the join logic.


*Gaining access to the parse tree*

o.colon could get the relevant parse tree from self.token.node, because *colons 
are significant tokens*. Job done.

How to access the parse tree for long lines? Using the newline token seems 
reasonable, because newline tokens are also significant. However, the 
one-line code snippets used by the split/join logic don't contain *any* 
newlines.

*Problems assigning newline tokens*

More generally, the last newline of a code snippet is assigned to the 
ast.Module node. At the very least, this must be changed. Or does it?  And 
if so, how?

We could ignore (temporarily) the problems with assigning tokens to nodes. 
For example, we could "trigger" the split join logic in the o.name token 
handler. "name" tokens are significant, so self.token.node will be the 
parse tree for the name. For function calls, we would have to look up the 
tree to determine whether the name is a function name. Doable, but not 
pretty.

"return", "if", "while" etc are keywords, so the parse tree is usable as 
is. Assignments would require a trigger on "=" tokens, that is, op tokens 
whose values is "=".

So this approach is clunky. It spreads the split/join logic over too many 
nodes. It seems more reasonable to trigger the split/join logic on the 
'newline' token, or the 'endmarker' token for the special case that the 
file/snippet ends without a newline. Or maybe we can just force a trailing 
newline for all files/snippets.

*Extrinsically significant tokens*

At present, tokens are either classified as significant or insignificant. 
That is "significance" is an *intrinsic *property of each token. This is 
foolish, and limiting.

Indeed, the ast.Call and ast.Tuple visitors already call tog.gen_token for 
parentheses tokens. In such contexts, parens should be considered 
significant, and the eventual call to tog.sync_token should synchronize on 
those tokens. This would assure that the parens are assigned to the proper 
node! Alas, sync_token doesn't do that. At present, it just stupidly 
returns, assigning the parens (later) to the next "officially" significant 
token. As a result, rarens are not assigned properly for calls and tuples.

We could go further, and have various visitors generate comma tokens, but I 
doubt that would ever be useful.

*Summary*

Whatever happens, the code should properly assign paren tokens in calls and 
tuples. Ditto for newline tokens that end many statement lines. Only 
tog.sync_token will need to change, but that will be surprisingly tricky. 
Details omitted.

I'll investigate using the parse tree as a guide to splitting and joining 
lines only after parens and newline tokens are more reasonably assigned to 
ast nodes.

Edward

P. S. There is another complication: statements may become "long" via 
python's backslash-newline convention. The black tool itself takes the 
extreme view that backslash-newlines should always be eliminated. But this 
would be wrong, wrong, wrong in Leo, because Leo nodes can not represent 
underindented triple-quoted strings. For example, all of the unit tests in 
leoAst.py for multi-line test code contain this pattern:

    # use r'""" if lines contain back-slashes.
    contents = """\ 
line 1
line 2
"""

Depending on the outline level of the node in which this code resides, line 
1, line 2 etc. will initially contain *unseen leading whitespace*. The 
test-running code removes such leading whitespace. Anyway, Leo depends on 
the backslash newline convention.

Even outside Leo the coding pattern shown above seems perfectly reasonable 
for unit-tests. Why prohibit it?

EKR

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/leo-editor/4078c167-1649-4a3f-9497-f2ef0db854c1%40googlegroups.com.

Reply via email to