Re: [creduce-dev] rm-toks-* passes: why flex-generated parser instead of clang?

Konstantin Tokarev Fri, 12 Jul 2013 09:38:00 -0700

12.07.2013, 20:13, "John Regehr" <[email protected]>:

>  Thanks for letting me know.  I had intended to use a real lexer for a
>  long time but somehow only got around to it recently.
>
>  Konstantin, Yang and I are trying to figure out how to speed up the
>  initial part of C-Reduce when it is given a very large C++ file.  The
>  line-based passes are just not that great.
>
>  Maybe you can give us feedback on our current idea.  The idea is to
>  remove function bodies.  This can be done either by replacing a
>  definition with a declaration, or simply by stripping everything out of
>  the function definition (except for an appropriate "return" statement,
>  obviously).
>
>  My current idea is to reuse the line-based logic.  In other words, we
>  first try to delete all function bodies, then the first half of them,
>  then the second half, then the first quarter, etc...
>
>  I think that if this is implemented wisely, a large speedup may be
>  possible.  Does this seem reasonable?


Idea is fine, however I'm not sure if it's suitable for initial part.
My concern is speed of parsing. If solution will use clang, it may take
a long time to parse large file and figure out function bodies on each
iteration, in contrast to "dumb" delta passes relying on topformflat.

I have another proposal.

In the most of cases with real C++ code (not generated) the most generic
code from C and C++ standard libraries is placed in the top part of 
translation unit, and most specific near the bottom. I've done several
reductions the next way:

1. Split preprocessed source into large "header" and small "source".

I believe here is a place where "a little brain time can save a lot of
CPU time" (tm), but I think in practise it can be automated, e.g. by
shrinking 1/5 or 200K (whichever is larger) from bottom.

2. Make precompiled headers from this header (if several compilers are
involved, each one requires its own copy of header and pch - tedious to
do by hands)

3. Reduce small "source" part.

4. See if source still depends on header, if no - we are done.

5. If header is large, split it into 2 parts again, make new source and
header and repeat 1-4.

6. cat two parts together and reduce result.

I see that some intelligence is needed to achieve exit in point (4), however
line-based reductions goes much faster when there is more unused library code.


-- 
Regards,
Konstantin

Re: [creduce-dev] rm-toks-* passes: why flex-generated parser instead of clang?

Reply via email to