Still digesting your reply as time permits (and thank you for the feedback!) but have one small question/issue regarding your suggestion of using a SmallPtrSet for storing the BestResults: creating a SmallPtrSet requires setting a fixed size for the set--how would one determine a reasonable bounding size for the set?
Thanks, Kaelyn On Mon, Mar 14, 2011 at 10:28 AM, Douglas Gregor <[email protected]> wrote: > > On Mar 9, 2011, at 11:35 AM, Kaelyn Uhrain wrote: > > > I just finished up a new version of my patch earlier this week, which no > longer uses the RecursiveASTVisitor (per chandlerc's suggestion). I've > rebased the patch against the current SVN and have attached it for you to > test. With the new version only the FixIt/typo.cpp and > SemaObjC/synth-provisional-ivars.m tests are failing, and for those two I'm > not entirely sure which parts of the behavior are correct/acceptable and > which might be a bug in my changes. > > Definitely looking better! I don't have the time at the moment to re-run my > performance tests, but I will comment on some aspects of your patch... > > > Index: lib/Sema/SemaDecl.cpp > =================================================================== > --- lib/Sema/SemaDecl.cpp (revision 127346) > +++ lib/Sema/SemaDecl.cpp (working copy) > @@ -264,24 +264,24 @@ > > if (DeclarationName Corrected = CorrectTypo(Lookup, S, SS, 0, 0, > CTC_Type)) { > if (NamedDecl *Result = Lookup.getAsSingle<NamedDecl>()) { > + std::string Suggestion = Result->getQualifiedNameAsString(); > + // FIXME: Silly formatting rules make us have to quote strings > ourselves > + // to keep quoting consistent between strings and IdentiferInfos. > + std::string QuotedSug = "'" + Suggestion + "'"; > > getQualifiedNameAsString() isn't necessarily what we want here, because it > always produces the fully-qualified name. That means it will overqualify > when we're already in a namespace, e.g., > > namespace N { > namespace inner { > class vector { /* ... */ }; > } > > void f() { > vector v; // will suggest N::inner::vector, when > inner::vector would suffice > } > } > > A similar issue is inline namespaces. For example, if one has: > > namespace std { > inline namespace __1 { > class vector { /* ... */ }; > } > } > > void f() { > vector v; // will suggest std::__1::vector, when std::vector > would suffice > } > > It would be preferable to add some kind of function that computes the > minimum qualification needed to get from the specified context (e.g., > CurContext, for unqualified name lookup) to the entity, since that's what > the user would want to type. > > Incidentally, when testing this, I actually used > > template<typename T> class vector { /* ... */ }; > > in my examples, and got incorrect Fix-Its and diagnostics: > > t.cpp:7:4: error: no template named 'vector'; did you mean 'vector'? > vector<int> v; // will suggest ... > ^~~~~~ > vector > > > +void TypoCorrectionConsumer::CheckEditDistanceAndStore( > + llvm::StringRef Name, DeclSpecPair DSPair, unsigned DistOffset) { > + using namespace std; > + > // Compute an upper bound on the allowable edit distance, so that the > // edit-distance algorithm can short-circuit. > unsigned UpperBound = min(unsigned((Typo.size() + 2) / 3), > BestEditDistance); > @@ -2872,7 +2884,7 @@ > // Compute the edit distance between the typo and the name of this > // entity. If this edit distance is not worse than the best edit > // distance we've seen so far, add it to the list of results. > - unsigned ED = Typo.edit_distance(Name, true, UpperBound); > + unsigned ED = Typo.edit_distance(Name, true, UpperBound) + DistOffset; > if (ED == 0) > return; > > The DistOffset should be added into the UpperBound, so that the > edit_distance implementation can abort early if the resulting edit distance > is going to be too high. > > + // Only consider entities that haven't already been found. > + DeclContext *NDContext = ND->getDeclContext()->getPrimaryContext(); > + iterator N = BestResults.find(ShortName); > + if (N != BestResults.end()) { > + for (DeclSpecList::iterator D = N->second.begin(), DEnd = > N->second.end(); > + D != DEnd; ++D) { > + if (NDContext == D->first) return; > + } > + } > > Should we use a SmallPtrSet for this, so that we get faster lookup if the > BestResults set happens to get large? > > +void ScanNamespaceForCorrections(DeclContext *DC, > + TypoCorrectionConsumer &Consumer, > + unsigned DistanceOffset, bool DownOnly) { > + > + if (DistanceOffset > Consumer.getBestEditDistance()) > + return; > + > + DeclContext *ParentDC = DC->getLookupParent(); > + > + for (DC = DC->getPrimaryContext(); DC != NULL; DC = > DC->getNextContext()) { > + for (DeclContext::decl_iterator D = DC->decls_begin(), > + DEnd = DC->decls_end(); D != DEnd; ++D) { > + NamedDecl *ChildDecl = dyn_cast<NamedDecl>(D->getCanonicalDecl()); > + NamespaceDecl *ChildND = dyn_cast_or_null<NamespaceDecl>(*D); > + if (ChildDecl) > + Consumer.FoundDecl(ChildDecl, NULL, false, DistanceOffset); > + if (ChildND) > + ScanNamespaceForCorrections(ChildND, Consumer, > + DistanceOffset + 1, true); > + } > + } > + > + if (DownOnly || !ParentDC) return; > + > + ScanNamespaceForCorrections(ParentDC, Consumer, DistanceOffset, false); > +} > > This will end up walking most of the translation unit. It's better than the > RecursiveASTVisitor, since it won't walk into function bodies or the > definitions of classes, but there's still some redundant work. I'll make a > concrete suggestion at the end of this reply. > > + for (DeclSpecList::iterator DSI = I->second.begin(), > + DSIEnd = I->second.end(); > + DSI != DSIEnd; /* Increment in loop, */) { > + // Perform name lookup on this name. > + IdentifierInfo *Name = &Context.Idents.get(I->getKey()); > + LookupPotentialTypoResult(*this, Res, Name, S, SS, > + DSI->second ? DSI->first : MemberContext, > + EnteringContext, CTC); > > Using DSI->first as the member context isn't quite what we want, because we > don't want to turn an unqualified lookup into a member lookup. Rather, we > want to turn an unqualified lookup into a qualified lookup. So, I suggest > that you form a new nested-name-specifier (e.g., a new "SS") that provides > the minimum qualification needed to get from the current context over to the > result we found (this is related to my first comment). If that works, you > could write the results back to SS, so that our caller can go ahead and use > that nested-name-specifier for the rest of semantic analysis as if the user > had written it explicitly. (Naturally, callers would need some updating to > understand that this could happen). The key here is that, on successfully > returning an alternative result from typo correction, the name-lookup state > looks *exactly* as it would had the user typed the right thing. > > Incidentally, I think this is the source of the > SemaObjC/synth-provisional-ivars.m failure, since we're allowing lookup into > an Objective-C class even though one can never qualify the name of an entity > in an Objective-C class. We shouldn't ever do qualifier checks in > (Objective-)C, although it's happening here. > > Back to my comment about repeated work. The typo correction code for > unqualified lookup makes a pass over all of the identifiers known in the > translation unit, which includes all of the names of entities buried in > namespaces. So, when we also go walk the AST to find all of the namespaces > (and the members of those namespaces), so we end up hitting many of those > identifiers a second time, and traversing most of the AST in the process > (which is more expensive than just looking at the identifiers). So, here's a > concrete recommendation: > > 1) Don't walk the namespaces at all; let the check against all of > the identifiers in the translation unit winnow the list down to the best > candidates based on name alone. We don't know which of these candidates is > visible, or how to get there. > 2) When we're performing lookup for each candidate (e.g., the calls > to LookupPotentialTypoResult), first try unqualified lookup. If that fails, > save the candidate in a separate "try qualified name lookup" set. > 3) If none of the candidates can be found by unqualified lookup, > start looking in each of the namespaces we know about to see if qualified > name lookup would work to find each of the candidates. You'll likely need a > side structure containing all of the (unique, canonicalized) NamespaceDecls > in the translation unit to make this efficient. > > Implemented this way, there's no additional penalty for typo correction of > visible names. Moreover, we skip the AST walk entirely, and only perform > targeted lookups for names that we know exist (because they were in the > identifier table). This also permits a few more optimizations: > > - We could (roughly) order namespaces based on their depth, so that > we'll tend to visit the more top-level (and, therefore, more likely to have > short qualifiers) namespaces first. If the qualifier for a namespace would > be longer than the shortest qualifier we've found so far, we don't have to > look in that namespace. > > - For a given namespace, we could look for all of the candidates in > that namespace at the same time, improving locality (which is particularly > import when those lookups hit the disk in the PCH case). > > Sorry for the long e-mail; you've got me thinking a lot about this :) > > - Doug
_______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
