On Feb 22, 2011, at 11:21 AM, Kaelyn Uhrain wrote:

> This patch implements checking available C++ namespaces when trying to find 
> typo corrections for unknown identifiers.  There are currently 4 broken unit 
> tests (CXX/basic/basic.lookup/basic.lookup.argdep/p4.cpp, 
> CXX/class/class.local/p1.cpp, FixIt/typo.cpp, and 
> SemaObjC/synth-provisional-ivars.m) which I'm fairly sure are cases where the 
> tests need to be updated to reflect the new behavior instead of being 
> indicative of actual code breakage, but I'm not 100% certain.  The new unit 
> test, test/Sema/missing-namespace-qualifier-typo-corrections.cpp, is a good 
> demonstration of the improved suggestions provided by this patch as most of 
> the errors found by "clang -fsyntax-only" lack suggestions without this 
> patch..

This is really cool, and is be a great step forward in usability.

My only really concern is about performance, so I did a quick measurement. My 
test was to put all of the C++ Standard Library headers into a PCH file, then 
include it in a program that amounted to, basically,

        int main() {
                vector<int> v;
                std::Vector<int> v2;
        }

so there are two typo corrections. And here is some timing data from 
before/after the patch:

Before:

*** AST File Statistics:
  18 stat cache hits
  2 stat cache misses
  97/39174 source location entries read (0.247613%)
  11142/18925 types read (58.874504%)
  16712/32353 declarations read (51.655178%)
  1734/6785 identifiers read (25.556374%)
  1861/70867 statements read (2.626046%)
  0/1766 macros read (0.000000%)
  188/3464 lexical declcontexts read (5.427252%)
  43/695 visible declcontexts read (6.187050%)

Wall time: 0m0.051s

After:

*** AST File Statistics:
  18 stat cache hits
  2 stat cache misses
  97/39174 source location entries read (0.247613%)
  17162/18925 types read (90.684280%)
  28421/32353 declarations read (87.846565%)
  3943/6785 identifiers read (58.113487%)
  70371/70867 statements read (99.300095%)
  0/1766 macros read (0.000000%)
  3383/3464 lexical declcontexts read (97.661659%)
  6080/695 visible declcontexts read (874.820129%)

Wall time: 0m0.115s

So it's a little over 2x slower, and (more importantly) deserializes a whole 
lot more from the PCH file. That's actually a concern, since typo correction 
still needs to be fast for many applications, and the additional disk I/O 
really becomes a problem as the PCH files get larger (e.g., containing 
significant chunks of Boost).

I have a recommendation. Rather than using the recursive AST visitor to walk 
the entire AST, can we instead 

  (1) First do the "identifier" lookup that we already do, to find the best 
name that we know about (which could be anywhere in the translation unit), then
  (2) If lookup for that identifier fails, go look in all of the namespaces 
that we know about and see if the identifier is located in one of those.

We'd have to have some list of known namespaces stored somewhere (and saved in 
the PCH file for fast retrieval), but at least this way we'd be paying the cost 
we already pay now to look at the identifiers + a cost that's O(number of 
unique namespaces). 

Does that seem reasonable?


        - Doug
_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Reply via email to