The new code hasn't been upped yet. There are a few bugs remaining, but already the new base is a more robust and flexible way of handling the myriad complexities of javascript.
It's obvious to me that none of this would have happened if Tom had not suggested <https://groups.google.com/d/msg/leo-editor/Ct3ZKqTo_KE/78R-I4-UAwAJ>that the importers be re-imagined. The specifics weren't quite on target, but they must have primed my subconscious. There are two main parts of the new code, simpler scanning and regex pattern matching. The new code leaves everything unchanged (or unused), except the all-important scan method. *Scanning*The new code is based on scanning text, not parsing it. You could call it the most important breakthrough. It came to me in the shower. Imo, there would be no way to handle the myriad possible javascript patterns in a parser, even if one had a complete parse tree. Instead, the new scanner breaks the code into *blocks* based on counts of parens and curly bracket. So a block starts with a line that *ends* with an unbalanced parenthesis or curly bracket and continues up to and including a line that ends with both parens and curly brackets being balanced. At a single stroke, all parsing difficulties disappear. This is *exactly* the kind of line-oriented approach that the javascript importer must have. *The ScanState class*All my previous scanners have used collections of variables/ivars to keep track of scan state. But this is the hard way. Instead, a new ScanState class handles all the details. The main methods: - scan_line: scans a line, updating the internal state, including whether the line is in a string or block comment. - at_top_level: returns True if parens/brackets are matched and not in a string/comment. These helper methods greatly simply the process of breaking lines into blocks. *Regex pattern matching* Each block naturally becomes the body text of a new outline node. But what should the headline be? Just as in the coffeescript importer, the new javascript importer scans *start* of the block's text, trying to match a regex pattern from a table of such patterns. The first pattern found specifies the outline in a straightforward way. Again, this is *exactly* what is needed. It is simple and extensible, and completely replaces parsing or other language-specific information. Here is the heart of the code: proto1 = re.compile( r'(\s*)Object.create(\s*)=(\s*)function(.*)\n' + r'(\s*)var(\s+)(\w+)(\s*)=(\s*)function', re.MULTILINE) table = ( (7, 'proto', proto1), (0, 'proto', r'(\s*)Object.create(\s*)=(\s*)function(\s*)\('), (0, 'proto', r'Function\.prototype\.method(\s*)=(\s*)function'), (3, 'func', r'(\s*)function(\s+)(\w+)'), # function x (3, 'func', r'(\s*)var(\s+)(\w[\w\.]*)(\s*)=(\s*)function\('), (3, 'var', r'(\s*)var(\s+)(\w[\w\.]*)(\s*)=(\s*)new(\s+)(\w+)'), (3, 'var', r'(\s*)var(\s+)(\w[\w\.]*)(\s*)=(\s*){'), (2, 'func', r'(\s*)(\w[\w\.]*)(\s*)=(\s*)function(\s*)\('), (6, 'class', r'(\s*)define(\*s)\((\s+)function(\s*)\((\s*)(\w+)'), (0, 'class', r'(\s*)define(\s*)\((.*),(\s*)function\('), ) s = ''.join(block) for i, prefix, pattern in table: m = re.match(pattern, s) if m: name = prefix + ' ' + (m.group(i) if i else '') return n, name.strip() return n+1, 'block %s' % (n) The great thing about this is that I can surf the web looking for javascript patterns. When I find a new one, I can add it to the table. *Recanning* The scan method creates child nodes. The rescan method rescans the body text of the children, looking for new blocks that can be turned into grandchild nodes, great-grandchild nodes, etc. This code is not quite ready. In fact, there are subtle issues about when to rescan. The present code sets a limit, say 50 lines. There is not much point in rescanning a node with fewer lines. Rescanning also sometimes splits if/else statements into blocks. We might want not to do this if the *created* blocks would be less than, say, another threshold value, which may or may not be 50 lines. At present, this test is not done, and maybe it won't ever be. *Summary* The new scheme is already a huge success. Decisions never involve scan state--they are made at a much higher level. Even with bugs, the new code already handles javascript much more robustly than the old code. I'll be upping the new code later today. Edward -- You received this message because you are subscribed to the Google Groups "leo-editor" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/leo-editor. For more options, visit https://groups.google.com/d/optout.
