The thesis suggests that using Uniplate with the fastest instances
will be around 30% slower than manual operations, but using SYB based
instances will be 300% slower. In practice, these are benchmarks on
generic traversals, and may not accurately reflect how people using
the libraries in practice - the bottlenecks may be elsewhere.

As one data point, when we decided to use Strafunski (SYB predecessor)
for HaRe [1], it seemed that something had to be done about the various
sources of inefficiency in those generic traversals, but the reduction in
code size and improved readability/maintainability made it a winner anyway, turning an impractical implementation effort into something manageable. In practice, it never became the main performance bottleneck, so we never did anything about it.

That was for one-at-a-time traversals, though (you do a refactoring,
which will do a very small number of traversals for analysis and transformation, and then you are back in the GUI). The situation
is likely to be different when the Ast types are very complex and
large, and when many traversals are chained in sequence.

No boilerplate removal code 1:30min
Uniplate with PlateDirect (fastest method) 1:45min
Uniplate with PlateData (based on SYB) 5min

And I suspect directly using SYB would have been somewhere in the
10-15min range.

It would be useful to have more information about the causes
for those timings. As I recall, the main sources of inefficiency
in SYB were (a) traversing irrelevant parts of the structure
(because everything is treated generically), (b) combining
results in simple, but inefficient ways ((++) nesting with the
structure), (c) repeated runtime type checks to determine which function to apply.

While runtime type checks are inherent in SYB (and alternatives
have been proposed that claim to avoid them), all of a/b/c can
be addressed to some extent in defining traversals using the
library. There tends to be a trade-off for a vs c, as avoiding
senseless generic traversals of specific substructures implies
additional runtime type checks to identify those substructures
(see the various special cases in showData for GHC, at [2]).

To avoid that trade-off, one could specialise often-used
traversal schemes, building in more knowledge about the
layout of GHC Ast types (instead of rediscovering said layout
at runtime in a fully generic traversal). As an extreme step, one could prototype using SYB, then switch to handtuned traversals for successful but inefficient prototypes. As Neil points out, that tuning need not even change the interfaces.

My view is that anyone manipulating an abstract syntax tree inside a
compiler without using some form of generic programming is doing it
wrong.

This also applies to other Ast manipulating tools. Having Data/
Typeable instances enables tools and experiments that are simply
not practical without these tools. We should keep an eye on performance issues, however.

Claus

[1] 
http://www.cs.kent.ac.uk/projects/refactor-fp/publications/tool-support-for-rfp.pdf
[2] http://hackage.haskell.org/trac/ghc/wiki/GhcApiAstTraversals

_______________________________________________
Cvs-ghc mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/cvs-ghc

Reply via email to