Idea:
use tokenize python module to find where all function/method and class
definitions start,
and then use this data to find lines where top level children should start.
After creating the top level children, the process can be repeated for all
nodes which have more than certain threshold number of lines, generating
the second level children.
A little bit of background story (feel free to skip if you just want to see
the example code):
A long ago I've tried to solve this problem in more efficient way for
importing JavaScript files. I remember looking in the Importer class and
the way Leo did imports at the time and feeling that it was too
complicated, much more than necessary. I can't say that I've solved this
problem in general, but for a very specific case, it worked pretty well.
Recent posts about improving Leo in this area, especially regarding Python,
made me think again about this problem.
I strongly feel that the main problem with the current implementation is
insisting on the use of scan_line. This is maybe suitable for unification
of all other source languages, but it is far from the optimal when we talk
about the python source files.
The way I see this problem is to search and find the lines where a new node
should start. Whether this node should be indented or not, I would rather
leave for the next phase. First of all, the outline structure of my python
files which I start from the scratch in Leo usually have in the top level
node a few lines, then comes at-others and usually after at-others comes
the block with `if __name__ == '__main__':. If I have a lot of imports,
then I usually put all imports in one section node `<<imports>>`.
The first line where I would introduce the first child node is actually
first function definition or first class definition. Everything before
should go directly in the root node. Imports can be extracted later.
Attached to this message is a python script which can be executed inside
Leo.
It imports any python module from the standard python library and checks to
see if the import is perfect or not. At the moment it just extracts top
level definitions in separate nodes, the direct children of root.
--
You received this message because you are subscribed to the Google Groups
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/leo-editor/a33ace8a-ee35-4166-8ca8-d25a69f6cdf0n%40googlegroups.com.
def find_node_borders(txt):
'''
Returns a list of tuples (startrow, endrow, headline)
for direct children of the node.
'''
inp = io.StringIO(txt)
tokens = list(tokenize.generate_tokens(inp.readline))
res = []
open_definitions = defaultdict(list)
for i, tok in enumerate(tokens):
row, col = tok[2]
if tok[0] == token.DEDENT:
for k in open_definitions:
if k >= col:
for r in open_definitions[k]:
if r[2] is None: r[2] = row
del open_definitions[k][:]
elif tok[0] == token.NAME and tok[1] in ('def', 'class'):
res.append([row, col, None, tok[-1].strip()])
open_definitions[col].append(res[-1])
i = 1
nodes = [[1,1, '']]
for a, col, b, x in res:
if col > 0: continue # ignore deeper definitions
if a > nodes[-1][1]:
nodes[-1][1] = a
nodes.append([a, b, make_headline(x)])
nodes.append([nodes[-1][1], None, ''])
return nodes