Yesterday I discovered a dead simple *design* pattern for finding *code* patterns in python parse trees.
This pattern is a milestone. It changes forever how I'll use python's ast <https://docs.python.org/3/library/ast.html> module. Old techniques now seem like trying to swat flies with a sledgehammer! The Aha is as surprising as the Aha that created @clean. I'll never forget it. *tl;dr: *Convert *parts* of parse trees to strings using ast.unparse. *Background* I have been revisiting an old problem: how to analyze Leo's code. The previous state of the art was unsatisfactory. There were two approaches, each flawed: 1. Use regex patterns. This approach works for prototyping, but there is no way to ignore false matches in strings, docstrings, and comments. 2. Use parse trees. This approach is sound, but finding patterns in ast nodes was cumbersome. How to get the best of both approaches? Finally, my work with the rope project made me familiar with a new design pattern: small, special-purpose tree traversers based on ast.NodeVisitor <https://docs.python.org/3/library/ast.html#ast.NodeVisitor>. *The Problem* I wanted a program to find all chains of ast Attribute nodes within Leo. Examples are g.app.gui and c.frame.body.wrapper.widget. I packaged this program as the TestChains.test_all_paths unit test in leo/unittests/test_design.py. This is a new file: see PR #3319 <https://github.com/leo-editor/leo-editor/pull/3319>. Early attempts were clumsy, as before. ast.Attribute trees can contain much more than ast.Name and ast.Attribute nodes. *The Aha: use ast.unparse* I had vaguely planned to experiment with ast.unparse <https://docs.python.org/3/library/ast.html#ast.unparse>. Given an ast node/tree, ast.unparse returns a string with code that would produce an equivalent node/tree. Aha: This string is exactly what we need to discover patterns in parse trees!!! Here is the *complete* traverser class that discovers all chains: class ChainsTraverser(NodeVisitor): chains_set = set() def visit_Attribute(self, node): """ Add only top-level Attribute chains to chains_set. Do *not* call generic_visit! """ chain = ast.unparse(node) self.chains_set.add(chain) OMG! This is amazing: - unparse returns the *entire* chain, no matter how complex. - visit_Attribute handles only top-level attributes because it *doesn't* call ast.generic_visit <https://docs.python.org/3/library/ast.html#ast.NodeVisitor.generic_visit>. A straightforward unit test (TestChains.test_all_paths) instantiates ChainsTraverser once, calls the traverser for most of Leo's core files, and prints/tests ChainsTraverser.chains_set. Nothing could be simpler. *Summary* Finding patterns in ast trees is straightforward: - Use ast visitors to find the *roots* of ast trees containing potential patterns. - Convert those roots to strings with ast.unparse. - Use regexs to filter the results. See the PostScript. This Aha changes my whole approach to analyzing ast trees. Edward P.S. TestChains.test_all_paths filters the resulting chains as follows: array_pat = re.compile(r'(\[).*?(\])') call_pat = re.compile(r'(\().*?(\))') string_pat1 = re.compile(r"(\').*?(\')") string_pat2 = re.compile(r'(\").*?(\")') patterns = (array_pat, call_pat, string_pat1, string_pat2) def filter_chain(s: str) -> str: def repl(m): return m.group(1) + m.group(2) for pattern in patterns: s = re.sub(pattern, repl, s) return s This kind of post-processing would be impossible if the operands were ast trees. Edward -- You received this message because you are subscribed to the Google Groups "leo-editor" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/leo-editor/a6fcd809-1555-47b2-ba15-767498270ec6n%40googlegroups.com.
