Re: [cfe-commits] PATCH: Add support for C++ namespace-aware typo correcting

Douglas Gregor Mon, 14 Mar 2011 10:31:02 -0700

On Mar 9, 2011, at 11:35 AM, Kaelyn Uhrain wrote:

> I just finished up a new version of my patch earlier this week, which no 
> longer uses the RecursiveASTVisitor (per chandlerc's suggestion).  I've 
> rebased the patch against the current SVN and have attached it for you to 
> test.  With the new version only the FixIt/typo.cpp and 
> SemaObjC/synth-provisional-ivars.m tests are failing, and for those two I'm 
> not entirely sure which parts of the behavior are correct/acceptable and 
> which might be a bug in my changes.


Definitely looking better! I don't have the time at the moment to re-run my 
performance tests, but I will comment on some aspects of your patch...


Index: lib/Sema/SemaDecl.cpp
===================================================================
--- lib/Sema/SemaDecl.cpp       (revision 127346)
+++ lib/Sema/SemaDecl.cpp       (working copy)
@@ -264,24 +264,24 @@
 
   if (DeclarationName Corrected = CorrectTypo(Lookup, S, SS, 0, 0, CTC_Type)) {
     if (NamedDecl *Result = Lookup.getAsSingle<NamedDecl>()) {
+      std::string Suggestion = Result->getQualifiedNameAsString();
+      // FIXME: Silly formatting rules make us have to quote strings ourselves
+      // to keep quoting consistent between strings and IdentiferInfos.
+      std::string QuotedSug = "'" + Suggestion + "'";

getQualifiedNameAsString() isn't necessarily what we want here, because it 
always produces the fully-qualified name. That means it will overqualify when 
we're already in a namespace, e.g.,

        namespace N {
                namespace inner {
                        class vector { /* ... */ };
                }

                void f() {
                        vector v; // will suggest N::inner::vector, when 
inner::vector would suffice
                }
        }

A similar issue is inline namespaces. For example, if one has:

        namespace std {
                inline namespace __1 {
                        class vector { /* ... */ };
                }
        }

        void f() {
                vector v; // will suggest std::__1::vector, when std::vector 
would suffice
        }

It would be preferable to add some kind of function that computes the minimum 
qualification needed to get from the specified context (e.g., CurContext, for 
unqualified name lookup) to the entity, since that's what the user would want 
to type. 

Incidentally, when testing this, I actually used

        template<typename T> class vector { /* ... */ };

in my examples, and got incorrect Fix-Its and diagnostics:

t.cpp:7:4: error: no template named 'vector'; did you mean 'vector'?
                        vector<int> v; // will suggest ...
                        ^~~~~~
                        vector


+void TypoCorrectionConsumer::CheckEditDistanceAndStore(
+    llvm::StringRef Name, DeclSpecPair DSPair, unsigned DistOffset) {
+  using namespace std;
+
   // Compute an upper bound on the allowable edit distance, so that the
   // edit-distance algorithm can short-circuit.
   unsigned UpperBound = min(unsigned((Typo.size() + 2) / 3), BestEditDistance);
@@ -2872,7 +2884,7 @@
   // Compute the edit distance between the typo and the name of this
   // entity. If this edit distance is not worse than the best edit
   // distance we've seen so far, add it to the list of results.
-  unsigned ED = Typo.edit_distance(Name, true, UpperBound);
+  unsigned ED = Typo.edit_distance(Name, true, UpperBound) + DistOffset;
   if (ED == 0)
     return;

The DistOffset should be added into the UpperBound, so that the edit_distance 
implementation can abort early if the resulting edit distance is going to be 
too high.

+  // Only consider entities that haven't already been found.
+  DeclContext *NDContext = ND->getDeclContext()->getPrimaryContext();
+  iterator N = BestResults.find(ShortName);
+  if (N != BestResults.end()) {
+    for (DeclSpecList::iterator D = N->second.begin(), DEnd = N->second.end();
+         D != DEnd; ++D) {
+      if (NDContext == D->first) return;
+    }
+  }

Should we use a SmallPtrSet for this, so that we get faster lookup if the 
BestResults set happens to get large?

+void ScanNamespaceForCorrections(DeclContext *DC,
+                                 TypoCorrectionConsumer &Consumer,
+                                 unsigned DistanceOffset, bool DownOnly) {
+
+  if (DistanceOffset > Consumer.getBestEditDistance())
+    return;
+
+  DeclContext *ParentDC = DC->getLookupParent();
+
+  for (DC = DC->getPrimaryContext(); DC != NULL; DC = DC->getNextContext()) {
+    for (DeclContext::decl_iterator D = DC->decls_begin(),
+         DEnd = DC->decls_end(); D != DEnd; ++D) {
+      NamedDecl *ChildDecl = dyn_cast<NamedDecl>(D->getCanonicalDecl());
+      NamespaceDecl *ChildND = dyn_cast_or_null<NamespaceDecl>(*D);
+      if (ChildDecl)
+        Consumer.FoundDecl(ChildDecl, NULL, false, DistanceOffset);
+      if (ChildND)
+        ScanNamespaceForCorrections(ChildND, Consumer,
+                                    DistanceOffset + 1, true);
+    }
+  }
+
+  if (DownOnly || !ParentDC) return;
+
+  ScanNamespaceForCorrections(ParentDC, Consumer, DistanceOffset, false);
+}

This will end up walking most of the translation unit. It's better than the 
RecursiveASTVisitor, since it won't walk into function bodies or the 
definitions of classes, but there's still some redundant work. I'll make a 
concrete suggestion at the end of this reply.

+    for (DeclSpecList::iterator DSI = I->second.begin(),
+                                DSIEnd = I->second.end();
+         DSI != DSIEnd; /* Increment in loop, */) {
+      // Perform name lookup on this name.
+      IdentifierInfo *Name = &Context.Idents.get(I->getKey());
+      LookupPotentialTypoResult(*this, Res, Name, S, SS,
+                                DSI->second ? DSI->first : MemberContext,
+                                EnteringContext, CTC);

Using DSI->first as the member context isn't quite what we want, because we 
don't want to turn an unqualified lookup into a member lookup. Rather, we want 
to turn  an unqualified lookup into a qualified lookup. So, I suggest that you 
form a new nested-name-specifier (e.g., a new "SS") that provides the minimum 
qualification needed to get from the current context over to the result we 
found (this is related to my first comment). If that works, you could write the 
results back to SS, so that our caller can go ahead and use that 
nested-name-specifier for the rest of semantic analysis as if the user had 
written it explicitly. (Naturally, callers would need some updating to 
understand that this could happen). The key here is that, on successfully 
returning an alternative result from typo correction, the name-lookup state 
looks *exactly* as it would had the user typed the right thing.

Incidentally, I think this is the source of the 
SemaObjC/synth-provisional-ivars.m failure, since we're allowing lookup into an 
Objective-C class even though one can never qualify the name of an entity in an 
Objective-C class. We shouldn't ever do qualifier checks in (Objective-)C, 
although it's happening here.

Back to my comment about repeated work. The typo correction code for 
unqualified lookup makes a pass over all of the identifiers known in the 
translation unit, which includes all of the names of entities buried in 
namespaces. So, when we also go walk the AST to find all of the namespaces (and 
the members of those namespaces), so we end up hitting many of those 
identifiers a second time, and traversing most of the AST in the process (which 
is more expensive than just looking at the identifiers). So, here's a concrete 
recommendation:

        1) Don't walk the namespaces at all; let the check against all of the 
identifiers in the translation unit winnow the list down to the best candidates 
based on name alone. We don't know which of these candidates is visible, or how 
to get there.
        2) When we're performing lookup for each candidate (e.g., the calls to 
LookupPotentialTypoResult), first try unqualified lookup. If that fails, save 
the candidate in a separate "try qualified name lookup" set.
        3) If none of the candidates can be found by unqualified lookup, start 
looking in each of the namespaces we know about to see if qualified name lookup 
would work to find each of the candidates. You'll likely need a side structure 
containing all of the (unique, canonicalized) NamespaceDecls in the translation 
unit to make this efficient.

Implemented this way, there's no additional penalty for typo correction of 
visible names. Moreover, we skip the AST walk entirely, and only perform 
targeted lookups for names that we know exist (because they were in the 
identifier table). This also permits a few more optimizations:

        - We could (roughly) order namespaces based on their depth, so that 
we'll tend to visit the more top-level (and, therefore, more likely to have 
short qualifiers) namespaces first. If the qualifier for a namespace would be 
longer than the shortest qualifier we've found so far, we don't have to look in 
that namespace.

        - For a given namespace, we could look for all of the candidates in 
that namespace at the same time, improving locality (which is particularly 
import when those lookups hit the disk in the PCH case).

Sorry for the long e-mail; you've got me thinking a lot about this :)

        - Doug
_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Re: [cfe-commits] PATCH: Add support for C++ namespace-aware typo correcting

Reply via email to