Re: [cfe-commits] PATCH: Add support for C++ namespace-aware typo correcting

Douglas Gregor Thu, 24 Mar 2011 16:55:00 -0700

On Mar 24, 2011, at 10:37 PM, Kaelyn Uhrain wrote:

> Still digesting your reply as time permits (and thank you for the feedback!) 
> but have one small question/issue regarding your suggestion of using a 
> SmallPtrSet for storing the BestResults: creating a SmallPtrSet requires 
> setting a fixed size for the set--how would one determine a reasonable 
> bounding size for the set?


I suggest 4, just because it seems unlikely that we'll have more than 4 "best" 
results for a given typo.

        - Doug

> Thanks,
> Kaelyn
> 
> On Mon, Mar 14, 2011 at 10:28 AM, Douglas Gregor <[email protected]> wrote:
> 
> On Mar 9, 2011, at 11:35 AM, Kaelyn Uhrain wrote:
> 
> > I just finished up a new version of my patch earlier this week, which no 
> > longer uses the RecursiveASTVisitor (per chandlerc's suggestion).  I've 
> > rebased the patch against the current SVN and have attached it for you to 
> > test.  With the new version only the FixIt/typo.cpp and 
> > SemaObjC/synth-provisional-ivars.m tests are failing, and for those two I'm 
> > not entirely sure which parts of the behavior are correct/acceptable and 
> > which might be a bug in my changes.
> 
> Definitely looking better! I don't have the time at the moment to re-run my 
> performance tests, but I will comment on some aspects of your patch...
> 
> 
> Index: lib/Sema/SemaDecl.cpp
> ===================================================================
> --- lib/Sema/SemaDecl.cpp       (revision 127346)
> +++ lib/Sema/SemaDecl.cpp       (working copy)
> @@ -264,24 +264,24 @@
> 
>   if (DeclarationName Corrected = CorrectTypo(Lookup, S, SS, 0, 0, CTC_Type)) 
> {
>     if (NamedDecl *Result = Lookup.getAsSingle<NamedDecl>()) {
> +      std::string Suggestion = Result->getQualifiedNameAsString();
> +      // FIXME: Silly formatting rules make us have to quote strings 
> ourselves
> +      // to keep quoting consistent between strings and IdentiferInfos.
> +      std::string QuotedSug = "'" + Suggestion + "'";
> 
> getQualifiedNameAsString() isn't necessarily what we want here, because it 
> always produces the fully-qualified name. That means it will overqualify when 
> we're already in a namespace, e.g.,
> 
>        namespace N {
>                namespace inner {
>                        class vector { /* ... */ };
>                }
> 
>                void f() {
>                        vector v; // will suggest N::inner::vector, when 
> inner::vector would suffice
>                }
>        }
> 
> A similar issue is inline namespaces. For example, if one has:
> 
>        namespace std {
>                inline namespace __1 {
>                        class vector { /* ... */ };
>                }
>        }
> 
>        void f() {
>                vector v; // will suggest std::__1::vector, when std::vector 
> would suffice
>        }
> 
> It would be preferable to add some kind of function that computes the minimum 
> qualification needed to get from the specified context (e.g., CurContext, for 
> unqualified name lookup) to the entity, since that's what the user would want 
> to type.
> 
> Incidentally, when testing this, I actually used
> 
>        template<typename T> class vector { /* ... */ };
> 
> in my examples, and got incorrect Fix-Its and diagnostics:
> 
> t.cpp:7:4: error: no template named 'vector'; did you mean 'vector'?
>                        vector<int> v; // will suggest ...
>                        ^~~~~~
>                        vector
> 
> 
> +void TypoCorrectionConsumer::CheckEditDistanceAndStore(
> +    llvm::StringRef Name, DeclSpecPair DSPair, unsigned DistOffset) {
> +  using namespace std;
> +
>   // Compute an upper bound on the allowable edit distance, so that the
>   // edit-distance algorithm can short-circuit.
>   unsigned UpperBound = min(unsigned((Typo.size() + 2) / 3), 
> BestEditDistance);
> @@ -2872,7 +2884,7 @@
>   // Compute the edit distance between the typo and the name of this
>   // entity. If this edit distance is not worse than the best edit
>   // distance we've seen so far, add it to the list of results.
> -  unsigned ED = Typo.edit_distance(Name, true, UpperBound);
> +  unsigned ED = Typo.edit_distance(Name, true, UpperBound) + DistOffset;
>   if (ED == 0)
>     return;
> 
> The DistOffset should be added into the UpperBound, so that the edit_distance 
> implementation can abort early if the resulting edit distance is going to be 
> too high.
> 
> +  // Only consider entities that haven't already been found.
> +  DeclContext *NDContext = ND->getDeclContext()->getPrimaryContext();
> +  iterator N = BestResults.find(ShortName);
> +  if (N != BestResults.end()) {
> +    for (DeclSpecList::iterator D = N->second.begin(), DEnd = 
> N->second.end();
> +         D != DEnd; ++D) {
> +      if (NDContext == D->first) return;
> +    }
> +  }
> 
> Should we use a SmallPtrSet for this, so that we get faster lookup if the 
> BestResults set happens to get large?
> 
> +void ScanNamespaceForCorrections(DeclContext *DC,
> +                                 TypoCorrectionConsumer &Consumer,
> +                                 unsigned DistanceOffset, bool DownOnly) {
> +
> +  if (DistanceOffset > Consumer.getBestEditDistance())
> +    return;
> +
> +  DeclContext *ParentDC = DC->getLookupParent();
> +
> +  for (DC = DC->getPrimaryContext(); DC != NULL; DC = DC->getNextContext()) {
> +    for (DeclContext::decl_iterator D = DC->decls_begin(),
> +         DEnd = DC->decls_end(); D != DEnd; ++D) {
> +      NamedDecl *ChildDecl = dyn_cast<NamedDecl>(D->getCanonicalDecl());
> +      NamespaceDecl *ChildND = dyn_cast_or_null<NamespaceDecl>(*D);
> +      if (ChildDecl)
> +        Consumer.FoundDecl(ChildDecl, NULL, false, DistanceOffset);
> +      if (ChildND)
> +        ScanNamespaceForCorrections(ChildND, Consumer,
> +                                    DistanceOffset + 1, true);
> +    }
> +  }
> +
> +  if (DownOnly || !ParentDC) return;
> +
> +  ScanNamespaceForCorrections(ParentDC, Consumer, DistanceOffset, false);
> +}
> 
> This will end up walking most of the translation unit. It's better than the 
> RecursiveASTVisitor, since it won't walk into function bodies or the 
> definitions of classes, but there's still some redundant work. I'll make a 
> concrete suggestion at the end of this reply.
> 
> +    for (DeclSpecList::iterator DSI = I->second.begin(),
> +                                DSIEnd = I->second.end();
> +         DSI != DSIEnd; /* Increment in loop, */) {
> +      // Perform name lookup on this name.
> +      IdentifierInfo *Name = &Context.Idents.get(I->getKey());
> +      LookupPotentialTypoResult(*this, Res, Name, S, SS,
> +                                DSI->second ? DSI->first : MemberContext,
> +                                EnteringContext, CTC);
> 
> Using DSI->first as the member context isn't quite what we want, because we 
> don't want to turn an unqualified lookup into a member lookup. Rather, we 
> want to turn  an unqualified lookup into a qualified lookup. So, I suggest 
> that you form a new nested-name-specifier (e.g., a new "SS") that provides 
> the minimum qualification needed to get from the current context over to the 
> result we found (this is related to my first comment). If that works, you 
> could write the results back to SS, so that our caller can go ahead and use 
> that nested-name-specifier for the rest of semantic analysis as if the user 
> had written it explicitly. (Naturally, callers would need some updating to 
> understand that this could happen). The key here is that, on successfully 
> returning an alternative result from typo correction, the name-lookup state 
> looks *exactly* as it would had the user typed the right thing.
> 
> Incidentally, I think this is the source of the 
> SemaObjC/synth-provisional-ivars.m failure, since we're allowing lookup into 
> an Objective-C class even though one can never qualify the name of an entity 
> in an Objective-C class. We shouldn't ever do qualifier checks in 
> (Objective-)C, although it's happening here.
> 
> Back to my comment about repeated work. The typo correction code for 
> unqualified lookup makes a pass over all of the identifiers known in the 
> translation unit, which includes all of the names of entities buried in 
> namespaces. So, when we also go walk the AST to find all of the namespaces 
> (and the members of those namespaces), so we end up hitting many of those 
> identifiers a second time, and traversing most of the AST in the process 
> (which is more expensive than just looking at the identifiers). So, here's a 
> concrete recommendation:
> 
>        1) Don't walk the namespaces at all; let the check against all of the 
> identifiers in the translation unit winnow the list down to the best 
> candidates based on name alone. We don't know which of these candidates is 
> visible, or how to get there.
>        2) When we're performing lookup for each candidate (e.g., the calls to 
> LookupPotentialTypoResult), first try unqualified lookup. If that fails, save 
> the candidate in a separate "try qualified name lookup" set.
>        3) If none of the candidates can be found by unqualified lookup, start 
> looking in each of the namespaces we know about to see if qualified name 
> lookup would work to find each of the candidates. You'll likely need a side 
> structure containing all of the (unique, canonicalized) NamespaceDecls in the 
> translation unit to make this efficient.
> 
> Implemented this way, there's no additional penalty for typo correction of 
> visible names. Moreover, we skip the AST walk entirely, and only perform 
> targeted lookups for names that we know exist (because they were in the 
> identifier table). This also permits a few more optimizations:
> 
>        - We could (roughly) order namespaces based on their depth, so that 
> we'll tend to visit the more top-level (and, therefore, more likely to have 
> short qualifiers) namespaces first. If the qualifier for a namespace would be 
> longer than the shortest qualifier we've found so far, we don't have to look 
> in that namespace.
> 
>        - For a given namespace, we could look for all of the candidates in 
> that namespace at the same time, improving locality (which is particularly 
> import when those lookups hit the disk in the PCH case).
> 
> Sorry for the long e-mail; you've got me thinking a lot about this :)
> 
>        - Doug
>

_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Re: [cfe-commits] PATCH: Add support for C++ namespace-aware typo correcting

Reply via email to