Great! CS FTW! --Guido
On Fri, Jul 30, 2010 at 1:28 PM, George Boutsioukis <gboutsiou...@gmail.com> wrote: > Hi everyone, I haven't had a chance to introduce myself yet. I'm George > Boutsioukis, a CS student from Greece and I'm currently enrolled as a GSOC > student for the PSF. The task I was involved with for the past few weeks was > speeding up the 2to3 tool. > > For those who are not aware of 2to3's internals, the tool matches a series > of tree patterns to a python syntax tree and applies a code transformation > on each match. The real bottleneck of this tool is the tree pattern matching > algorithm, since the current version traverses all nodes of the tree, > top-to-bottom, and checks each node against a set of tree patterns. > > The new algorithm I've implemented reduces the candidate nodes for matching. > The process works in three steps: > > 1) Each pattern tree is reduced to the most 'characteristic' path, i.e. the > rarest path you'll encounter on a real code tree. (this is simpler than it > sounds) > > 2) From the set of the above linear patterns an Aho-Corasick automaton is > constructed that is capable of inclusively matching all patterns(meaning > that it may mark a wrong candidate but will never miss an instance) > > 3) The automaton is run on each leaf until the root of the tree is reached > and the resulting match set is returned. And finally we apply each fixer > for each node in the match set. Of course the process is a bit more > complicated since we have to recheck the transformed code for new matches etc. > > Since it is quite a hard concept to illustrate in a post, tell me if you > have any questions or need more info. > > The benefit of this somewhat complex process is substantial; the total run > time is reduced by roughly 50%. The new module is still experimental code, > but mature enough to pass all tests. I propose to include the new module > with the next version of Python, perhaps as an explicit option for 2to3 > until it can be thoroughly tested. > > The code repository can be found here: > https://code.google.com/p/2to3-speedup2/source/browse/ > (the code is a fork from py3k's 2to3 and was tested with version 3.1.2) > > The blog I've been using for GSOC(containing more details on the matching > algorithm): http://george-gsoc.blogspot.com/ > > Regards, > George > > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (python.org/~guido) _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com