On Mar 9, 2011, at 11:35 AM, Kaelyn Uhrain wrote:
> I just finished up a new version of my patch earlier this week, which no
> longer uses the RecursiveASTVisitor (per chandlerc's suggestion). I've
> rebased the patch against the current SVN and have attached it for you to
> test. With the new version only the FixIt/typo.cpp and
> SemaObjC/synth-provisional-ivars.m tests are failing, and for those two I'm
> not entirely sure which parts of the behavior are correct/acceptable and
> which might be a bug in my changes.
Definitely looking better! I don't have the time at the moment to re-run my
performance tests, but I will comment on some aspects of your patch...
Index: lib/Sema/SemaDecl.cpp
===================================================================
--- lib/Sema/SemaDecl.cpp (revision 127346)
+++ lib/Sema/SemaDecl.cpp (working copy)
@@ -264,24 +264,24 @@
if (DeclarationName Corrected = CorrectTypo(Lookup, S, SS, 0, 0, CTC_Type)) {
if (NamedDecl *Result = Lookup.getAsSingle<NamedDecl>()) {
+ std::string Suggestion = Result->getQualifiedNameAsString();
+ // FIXME: Silly formatting rules make us have to quote strings ourselves
+ // to keep quoting consistent between strings and IdentiferInfos.
+ std::string QuotedSug = "'" + Suggestion + "'";
getQualifiedNameAsString() isn't necessarily what we want here, because it
always produces the fully-qualified name. That means it will overqualify when
we're already in a namespace, e.g.,
namespace N {
namespace inner {
class vector { /* ... */ };
}
void f() {
vector v; // will suggest N::inner::vector, when
inner::vector would suffice
}
}
A similar issue is inline namespaces. For example, if one has:
namespace std {
inline namespace __1 {
class vector { /* ... */ };
}
}
void f() {
vector v; // will suggest std::__1::vector, when std::vector
would suffice
}
It would be preferable to add some kind of function that computes the minimum
qualification needed to get from the specified context (e.g., CurContext, for
unqualified name lookup) to the entity, since that's what the user would want
to type.
Incidentally, when testing this, I actually used
template<typename T> class vector { /* ... */ };
in my examples, and got incorrect Fix-Its and diagnostics:
t.cpp:7:4: error: no template named 'vector'; did you mean 'vector'?
vector<int> v; // will suggest ...
^~~~~~
vector
+void TypoCorrectionConsumer::CheckEditDistanceAndStore(
+ llvm::StringRef Name, DeclSpecPair DSPair, unsigned DistOffset) {
+ using namespace std;
+
// Compute an upper bound on the allowable edit distance, so that the
// edit-distance algorithm can short-circuit.
unsigned UpperBound = min(unsigned((Typo.size() + 2) / 3), BestEditDistance);
@@ -2872,7 +2884,7 @@
// Compute the edit distance between the typo and the name of this
// entity. If this edit distance is not worse than the best edit
// distance we've seen so far, add it to the list of results.
- unsigned ED = Typo.edit_distance(Name, true, UpperBound);
+ unsigned ED = Typo.edit_distance(Name, true, UpperBound) + DistOffset;
if (ED == 0)
return;
The DistOffset should be added into the UpperBound, so that the edit_distance
implementation can abort early if the resulting edit distance is going to be
too high.
+ // Only consider entities that haven't already been found.
+ DeclContext *NDContext = ND->getDeclContext()->getPrimaryContext();
+ iterator N = BestResults.find(ShortName);
+ if (N != BestResults.end()) {
+ for (DeclSpecList::iterator D = N->second.begin(), DEnd = N->second.end();
+ D != DEnd; ++D) {
+ if (NDContext == D->first) return;
+ }
+ }
Should we use a SmallPtrSet for this, so that we get faster lookup if the
BestResults set happens to get large?
+void ScanNamespaceForCorrections(DeclContext *DC,
+ TypoCorrectionConsumer &Consumer,
+ unsigned DistanceOffset, bool DownOnly) {
+
+ if (DistanceOffset > Consumer.getBestEditDistance())
+ return;
+
+ DeclContext *ParentDC = DC->getLookupParent();
+
+ for (DC = DC->getPrimaryContext(); DC != NULL; DC = DC->getNextContext()) {
+ for (DeclContext::decl_iterator D = DC->decls_begin(),
+ DEnd = DC->decls_end(); D != DEnd; ++D) {
+ NamedDecl *ChildDecl = dyn_cast<NamedDecl>(D->getCanonicalDecl());
+ NamespaceDecl *ChildND = dyn_cast_or_null<NamespaceDecl>(*D);
+ if (ChildDecl)
+ Consumer.FoundDecl(ChildDecl, NULL, false, DistanceOffset);
+ if (ChildND)
+ ScanNamespaceForCorrections(ChildND, Consumer,
+ DistanceOffset + 1, true);
+ }
+ }
+
+ if (DownOnly || !ParentDC) return;
+
+ ScanNamespaceForCorrections(ParentDC, Consumer, DistanceOffset, false);
+}
This will end up walking most of the translation unit. It's better than the
RecursiveASTVisitor, since it won't walk into function bodies or the
definitions of classes, but there's still some redundant work. I'll make a
concrete suggestion at the end of this reply.
+ for (DeclSpecList::iterator DSI = I->second.begin(),
+ DSIEnd = I->second.end();
+ DSI != DSIEnd; /* Increment in loop, */) {
+ // Perform name lookup on this name.
+ IdentifierInfo *Name = &Context.Idents.get(I->getKey());
+ LookupPotentialTypoResult(*this, Res, Name, S, SS,
+ DSI->second ? DSI->first : MemberContext,
+ EnteringContext, CTC);
Using DSI->first as the member context isn't quite what we want, because we
don't want to turn an unqualified lookup into a member lookup. Rather, we want
to turn an unqualified lookup into a qualified lookup. So, I suggest that you
form a new nested-name-specifier (e.g., a new "SS") that provides the minimum
qualification needed to get from the current context over to the result we
found (this is related to my first comment). If that works, you could write the
results back to SS, so that our caller can go ahead and use that
nested-name-specifier for the rest of semantic analysis as if the user had
written it explicitly. (Naturally, callers would need some updating to
understand that this could happen). The key here is that, on successfully
returning an alternative result from typo correction, the name-lookup state
looks *exactly* as it would had the user typed the right thing.
Incidentally, I think this is the source of the
SemaObjC/synth-provisional-ivars.m failure, since we're allowing lookup into an
Objective-C class even though one can never qualify the name of an entity in an
Objective-C class. We shouldn't ever do qualifier checks in (Objective-)C,
although it's happening here.
Back to my comment about repeated work. The typo correction code for
unqualified lookup makes a pass over all of the identifiers known in the
translation unit, which includes all of the names of entities buried in
namespaces. So, when we also go walk the AST to find all of the namespaces (and
the members of those namespaces), so we end up hitting many of those
identifiers a second time, and traversing most of the AST in the process (which
is more expensive than just looking at the identifiers). So, here's a concrete
recommendation:
1) Don't walk the namespaces at all; let the check against all of the
identifiers in the translation unit winnow the list down to the best candidates
based on name alone. We don't know which of these candidates is visible, or how
to get there.
2) When we're performing lookup for each candidate (e.g., the calls to
LookupPotentialTypoResult), first try unqualified lookup. If that fails, save
the candidate in a separate "try qualified name lookup" set.
3) If none of the candidates can be found by unqualified lookup, start
looking in each of the namespaces we know about to see if qualified name lookup
would work to find each of the candidates. You'll likely need a side structure
containing all of the (unique, canonicalized) NamespaceDecls in the translation
unit to make this efficient.
Implemented this way, there's no additional penalty for typo correction of
visible names. Moreover, we skip the AST walk entirely, and only perform
targeted lookups for names that we know exist (because they were in the
identifier table). This also permits a few more optimizations:
- We could (roughly) order namespaces based on their depth, so that
we'll tend to visit the more top-level (and, therefore, more likely to have
short qualifiers) namespaces first. If the qualifier for a namespace would be
longer than the shortest qualifier we've found so far, we don't have to look in
that namespace.
- For a given namespace, we could look for all of the candidates in
that namespace at the same time, improving locality (which is particularly
import when those lookups hit the disk in the PCH case).
Sorry for the long e-mail; you've got me thinking a lot about this :)
- Doug
_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits