Idea:
use tokenize python module to find where all function/method and class 
definitions start,
and then use this data to find lines where top level children should start. 
After creating the top level children, the process can be repeated for all 
nodes which have more than certain threshold number of lines, generating 
the second level children.

A little bit of background story (feel free to skip if you just want to see 
the example code):
A long ago I've tried to solve this problem in more efficient way for 
importing JavaScript files. I remember looking in the Importer class and 
the way Leo did imports at the time and feeling that it was too 
complicated, much more than necessary. I can't say that I've solved this 
problem in general, but for a very specific case, it worked pretty well.

Recent posts about improving Leo in this area, especially regarding Python, 
made me think again about this problem.

I strongly feel that the main problem with the current implementation is 
insisting on the use of scan_line. This is maybe suitable for unification 
of all other source languages, but it is far from the optimal when we talk 
about the python source files.

The way I see this problem is to search and find the lines where a new node 
should start. Whether this node should be indented or not, I would rather 
leave for the next phase. First of all, the outline structure of my python 
files which I start from the scratch in Leo usually have in the top level 
node a few lines, then comes at-others and usually after at-others comes 
the block with `if __name__ == '__main__':. If I have a lot of imports, 
then I usually put all imports in one section node `<<imports>>`.

The first line where I would introduce the first child node is actually 
first function definition or first class definition. Everything before 
should go directly in the root node. Imports can be extracted later.

Attached to this message is a python script which can be executed inside 
Leo.
It imports any python module from the standard python library and checks to 
see if the import is perfect or not. At the moment it just extracts top 
level definitions in separate nodes, the direct children of root.

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/leo-editor/a33ace8a-ee35-4166-8ca8-d25a69f6cdf0n%40googlegroups.com.
def find_node_borders(txt):
    '''
    Returns a list of tuples (startrow, endrow, headline)
    for direct children of the node.
    '''
    inp = io.StringIO(txt)
    tokens = list(tokenize.generate_tokens(inp.readline))
    res = []
    open_definitions = defaultdict(list)
    for i, tok in enumerate(tokens):
        row, col = tok[2]
        if tok[0] == token.DEDENT:
            for k in open_definitions:
                if k >= col:
                    for r in open_definitions[k]:
                        if r[2] is None: r[2] = row
                    del open_definitions[k][:]
        elif tok[0] == token.NAME and tok[1] in ('def', 'class'):
            res.append([row, col, None, tok[-1].strip()])
            open_definitions[col].append(res[-1])
    i = 1
    nodes = [[1,1, '']]
    for a, col, b, x in res:
        if col > 0: continue # ignore deeper definitions
        if a > nodes[-1][1]:
            nodes[-1][1] = a
        nodes.append([a, b, make_headline(x)])
    nodes.append([nodes[-1][1], None, ''])
    return nodes

Reply via email to