Timing information: Three clangs were run, clang with no changes (control), duplicate enum with PointerUnion (most recent patch), duplicate enum with DenseMap without PointerUnion (next most recent patch). Each run with -fsyntax-only and -Wduplicate-enum for modified clangs. Runs 1, 2, and 3 are files with only enums. Run 4 is a preprocessed Clang.
Key: name: Average (Min-Max) Run1: Control: 13.763 (13.66-14.14) PointerUnion: 14.046 (13.94-14.16) DenseMap: 14.304 (14.24-14.39) Run2: Control: 20.189 (20.1-20.31) PointerUnion: 20.514 (20.37-20.6) DenseMap: 20.635 (20.56-20.7) Run3: Control: 26.715 (26.66-26.8) PointerUnion: 26.928 (26.8-27.12) DenseMap: 27.13 (27.05-27.22) Run4: Control: 29.686 (28.98-30.39) PointerUnion: 29.706 (28.73-30.69) DenseMap: 29.952 (29.3-30.63) On Tue, Aug 28, 2012 at 2:12 PM, Ted Kremenek <[email protected]> wrote: > How fast? (particularly compared to the first implementation, and > relative to -fsyntax-only) > > On Aug 28, 2012, at 11:48 AM, Richard Trieu <[email protected]> wrote: > > New patch with PointerUnion and DenseMap is slightly faster than the > previous DenseMap patch. > > On Mon, Aug 27, 2012 at 9:51 PM, Ted Kremenek <[email protected]> wrote: > >> Thanks! Quick question before I review it in more details: what is the >> performance characteristics of this patch compared to the others? >> >> On Aug 27, 2012, at 11:35 AM, Richard Trieu <[email protected]> wrote: >> >> Incorporated most of the suggestions into this patch. Still using a >> double pass over the constants for the reasons outlined below. >> >> On Fri, Aug 17, 2012 at 10:00 PM, Ted Kremenek <[email protected]>wrote: >> >>> BTW, I wrote this two days ago. For some reason my mail client didn't >>> send it out until now. My apologies for the delay. >>> >> No worries. Had some issues that cropped up on template diffing that >> took my time. >> >>> >>> On Aug 15, 2012, at 10:11 PM, Ted Kremenek <[email protected]> wrote: >>> >>> On Aug 15, 2012, at 6:12 PM, Richard Trieu <[email protected]> wrote: >>> >>> On Tue, Aug 14, 2012 at 9:48 PM, Ted Kremenek <[email protected]>wrote: >>> >>>> On Aug 14, 2012, at 2:32 PM, Richard Trieu <[email protected]> wrote: >>>> >>>> At a high level, I honestly find this logic to be more complicated >>>>> than I would have expected. The sorting seems unnecessary, and will >>>>> report >>>>> diagnostics in an unnatural order (first based on enum constant value, >>>>> then >>>>> on declaration order). A straight linear pass seems more naturally to me, >>>>> and DenseMap is very efficient. >>>>> >>>> Is there a comparison between the different containers in LLVM and the >>>> STL containers? >>>> >>>> >>>> This is a reasonable place to start: >>>> >>>> http://llvm.org/docs/ProgrammersManual.html#ds_map >>>> >>>> The key with DenseMap is that it is probed hashtable. There is one big >>>> allocation for the entire table, instead of a bunch of buckets. When >>>> applicable, it can be very fast, and feels like the right data structure to >>>> use here. >>>> >>> >>> Duplicate enum detection, now with DenseMap. The DenseMap maps a >>> int64_t to a vector pointer. 0 and 1 were special keys for the DenseMap, >>> so two separate pointers special cased for them. The vectors pointers are >>> stored in another vector in declaration order. One pass is made over the >>> enums to find ones without initializers. These are used to create vectors. >>> A second pass through the enums populates the vectors. Finally, a pass >>> over the vector of vectors is used to generate all the warnings and notes. >>> >>> Run time is fairly consistent with the sorted vector implementation, >>> which is max %3 difference against control. >>> <duplicate-enum-densemap.patch> >>> >>> >>> Thanks for working on this. My main concern is this patch now has a lot >>> of unnecessary malloc() traffic, which will certainly slow it down. >>> Comments inline: >>> >>> + >>> +static int64_t GetInt64(const llvm::APSInt& Val) { >>> + return Val.isSigned() ? Val.getSExtValue() : Val.getZExtValue(); >>> +} >>> + >>> +struct DenseMapInfoint64_t { >>> + static int64_t getEmptyKey() { return 0; } >>> + static int64_t getTombstoneKey() { return 1; } >>> + static unsigned getHashValue(const int64_t Val) { >>> + return (unsigned)(Val * 37); >>> + } >>> + static bool isEqual(const int64_t& LHS, const int64_t& RHS) { >>> + return LHS == RHS; >>> + } >>> +}; >>> >>> >>> This trait class doesn't look like it was actually used. The DenseMap >>> below just uses the default trait for int64_t. >>> >>> I also still think we can so something a bit smarter here. What I think >>> we need to distinguish between is whether or not a constant has appeared >>> more than once. We're saving a bit of memory on the keys, but spending >>> that savings elsewhere when we allocate the vectors unconditionally for >>> each constant. >>> >>> + >>> +// Emits a warning when an element is implicitly set a value that >>> +// a previous element has already been set to. >>> +static void CheckForDuplicateEnumValues(Sema &S, Decl **Elements, >>> + unsigned NumElements, EnumDecl >>> *Enum, >>> + QualType EnumType) { >>> + if (S.Diags.getDiagnosticLevel(diag::warn_duplicate_enum_values, >>> + Enum->getLocation()) == >>> + DiagnosticsEngine::Ignored) >>> + return; >>> + // Avoid anonymous enums >>> + if (!Enum->getIdentifier()) >>> + return; >>> + >>> + // Only check for small enums. >>> + if (Enum->getNumPositiveBits() > 63 || Enum->getNumNegativeBits() > >>> 64) >>> + return; >>> + >>> + typedef llvm::SmallVector<EnumConstantDecl*, 4> SameValueVector; >>> + typedef llvm::DenseMap<int64_t, SameValueVector*> ValueToVectorMap; >>> + typedef llvm::SmallVector<SameValueVector*, 10> DoubleVector; >>> + ValueToVectorMap EnumMap; >>> + DoubleVector EnumVector; >>> + SameValueVector *ZeroVector = 0, *OneVector = 0; >>> >>> >>> It took me a while to understand what this was doing, so I feel it could >>> really benefit from a comment. This also appears to result in a ton of >>> malloc traffic below. Here's my suggestion: >>> >>> typedef llvm::SmallVector<EnumConstantDecl*, 3> ECDVector; >>> typedef llvm::SmallVector<ECDVector *, 3> DuplicatesVector; >>> >>> typedef llvm::PointerUnion<EnumConstantDecl*, ECDVector *> >>> DeclOrVector; >>> typedef llvm::DenseMap<int64_t, DeclOrVector> ValueToVectorMap; >>> >>> DuplicatesVector DupVector; >>> ValueToVectorMap EnumMap; >>> >>> The trick here is that the DenseMap maps from a constant to the first >>> EnumConstantDecl it encounters. Only if we encounter a second >>> EnumConstantDecl with the same enum value do we pay the cost of allocating >>> another vector. This will drastically optimize in the common case, as >>> calling malloc() is really slow. Right now the code appears to be doing a >>> malloc() for every enum constant, which is going to really penalize us here. >>> >>> + >>> + for (unsigned i = 0; i < NumElements; ++i) { >>> + EnumConstantDecl *ECD = cast<EnumConstantDecl>(Elements[i]); >>> + if (!ECD) { >>> + for (DoubleVector::iterator I = EnumVector.begin(), E = >>> EnumVector.end(); >>> + I != E; ++I) >>> + delete *I; >>> + return; >>> + } >>> >>> >>> I don't quite understand this loop through DoubleVector here, but it >>> looks like logic in case we want to return early and cleanup. Is there a >>> case where the EnumConstantDecl can be null? >>> >>> According to ActOnEnumBody, EnumConstantDecl is null if a diagnostic has >> previously been emitted for the constant. Since the enum >> is possibly ill-formed, skip checking it. >> >>> >>> + >>> + if (ECD->getInitExpr()) >>> + continue; >>> + >>> + int64_t Val = GetInt64(ECD->getInitVal()); >>> + >>> >>> >>> Looks good. >>> >>> + if (Val == 0) { >>> + if (ZeroVector) continue; >>> + ZeroVector = new SameValueVector(); >>> + ZeroVector->push_back(ECD); >>> + EnumVector.push_back(ZeroVector); >>> + } else if (Val == 1) { >>> + if (OneVector) continue; >>> + OneVector = new SameValueVector(); >>> + OneVector->push_back(ECD); >>> + EnumVector.push_back(OneVector); >>> + } else { >>> + if (EnumMap.find(Val) != EnumMap.end()) >>> + continue; >>> + SameValueVector *ValueVector = new SameValueVector(); >>> + ValueVector->push_back(ECD); >>> + EnumVector.push_back(ValueVector); >>> + EnumMap.insert(std::make_pair(Val, ValueVector)); >>> >>> >>> The "find()" followed by the "insert()" is wasteful. It results in two >>> lookups to the hash table when we could have just used one. More on that >>> later. >>> >>> + } >>> + } >>> >>> >>> IMO, this looks like a lot of complexity just to handle the fact that 0 >>> and 1 are special values for the DenseMap. I don't really see this as the >>> right tradeoff; the code is more complicated with marginal impact on memory >>> usage or performance. >>> >>> If you humor me for a bit, consider using something else for the key, >>> e.g.: >>> >>> struct DupKey { >>> int64_t val; >>> bool isTombstoneOrEmptyKey; >>> }; >>> >>> The idea is if 'isTombStoneOrEmptyKey' is true, we can use val = 0 or >>> val = 1 to represent empty keys or tombstone entries. Otherwise, it's an >>> int64_t, with the full range of values. We can define a DenseMap trait to >>> do the right thing. Yes, this costs a tiny bit more in storage, but it >>> allows the data structure to handle the complete set of values in your >>> domain, instead of resorting to complicating the core algorithm. What I >>> see here now is the same code essentially duplicated twice, which makes it >>> harder to read and more error prone. >>> >>> If we use DupKey as our key for the DenseMap, we can instead do >>> something like this: >>> >>> DeclOrVector &entry = EnumMap[Val]; // Use default construction of >>> 'entry'. >>> // Is the first time we encountered this constant? >>> if (entry.isNull()) { >>> entry = ECD; >>> continue; >>> } >>> // Is this the second time we encountered this constant? If so, >>> // push the previous decl encountered and the one just encountered >>> // to a vector of duplicates. >>> if (EnumConstantDecl *D = entry.dyn_cast<EnumConstantDecl*>()) { >>> ECDVector *Vec = new ECDVector(); >>> Vec->push_back(D); >>> Vec->push_back(ECD); >>> >>> // Update the entry to refer to the duplicates. >>> entry = Vec; >>> >>> // Store the duplicates in a vector we can consult later for >>> // quick emission of diagnostics. >>> DupVector.push_back(Vec); >>> >>> // On to the next constant. >>> continue; >>> } >>> // Is this the third (or greater) time we encountered the constant? >>> If so, >>> // continue to add it to the existing vector. >>> ECDVector *Vec = entry.get<ECDVector*>(); >>> Vec->push_back(ECD); >>> >>> >>> With this code, we only allocate memory (beyond the DenseMap) when we >>> encounter a duplicate that would be worth reporting. In the common case, >>> this savings in malloc traffic should be noticeable. >>> >>> Notice also that I used: >>> >>> DeclOrVector &entry = EnumMap[Val]; // Use default construction of >>> 'entry'. >>> >>> This results in a single lookup in the hashtable. Since we plan on >>> adding a value for a key no matter what, by using this idiom we allow the >>> DenseMap to default construct an entry if it doesn't exist. This results >>> in a single hashtable lookup, from which we can modify the value in place. >>> This is obviously faster than doing a hashtable lookup twice. >>> >>> + >>> + for (unsigned i = 0; i < NumElements; ++i) { >>> + EnumConstantDecl *ECD = cast<EnumConstantDecl>(Elements[i]); >>> + if (!ValidDuplicateEnum(ECD, Enum)) >>> + continue; >>> + >>> + int64_t Val = GetInt64(ECD->getInitVal()); >>> + >>> + if (Val == 0) { >>> + if (!ZeroVector || *ZeroVector->begin() == ECD) >>> + continue; >>> + ZeroVector->push_back(ECD); >>> + } else if (Val == 1) { >>> + if (!OneVector || *OneVector->begin() == ECD) >>> + continue; >>> + OneVector->push_back(ECD); >>> + } else { >>> + ValueToVectorMap::iterator I = EnumMap.find(Val); >>> + if (I == EnumMap.end()) >>> + continue; >>> + SameValueVector *V = I->second; >>> + if (*V->begin() == ECD) >>> + continue; >>> + V->push_back(ECD); >>> + } >>> + } >>> >>> >>> This second loop looks unnecessary. I think we can do everything we >>> need to count duplicates with one loop. Of course the ValidDuplicateEnum() >>> would need to be hoisted to the first loop. >>> >>> Using two traverses allows two things to happen. One, the first element >> in the ECDVector will not have an initializer and will work with the >> warning. Otherwise, the vector needs to be searched for a proper enum >> constant to use. Two, it prevents unneeded creation of ECDVectors. If we >> have enum A { A1 = 2, A2 = 2, A3 = 1, A4 = 1, A5}; vectors for values 1 and >> 2 are created using a single pass while only a vector for 2 will be created >> using a double pass. >> >>> >>> + >>> + for (DoubleVector::iterator DoubleVectorIter = EnumVector.begin(), >>> + DoubleVectorEnd = EnumVector.end(); >>> + DoubleVectorIter != DoubleVectorEnd; ++DoubleVectorIter) { >>> + SameValueVector *V = *DoubleVectorIter; >>> + if (V->size() == 1) >>> + continue; >>> + >>> + SameValueVector::iterator I = V->begin(); >>> + S.Diag((*I)->getLocation(), diag::warn_duplicate_enum_values) >>> + << (*I)->getName() << (*I)->getInitVal().toString(10) >>> + << (*I)->getSourceRange(); >>> + ++I; >>> + for (SameValueVector::iterator E = V->end(); I != E; ++I) >>> + S.Diag((*I)->getLocation(), diag::note_duplicate_element) >>> + << (*I)->getName() << (*I)->getInitVal().toString(10) >>> + << (*I)->getSourceRange(); >>> + delete V; >>> + } >>> >>> >>> >>> This is more or less the same, essentially it becomes: >>> >>> for (DuplicateVector::iterator I = DupVector.begin(), E = >>> DupVector.end(); I != E; ++I) { >>> ECDVector *Vec = *I; >>> // do the diagnostic logic ... >>> delete *I; >>> } >>> >>> Note that with my suggestions the vector has size on order of the number >>> of duplicate constants, not the number of total constants. If there are no >>> duplicates, no work is required (including free'ing memory). >>> >>> +} >>> + >>> void Sema::ActOnEnumBody(SourceLocation EnumLoc, SourceLocation >>> LBraceLoc, >>> SourceLocation RBraceLoc, Decl *EnumDeclX, >>> Decl **Elements, unsigned NumElements, >>> @@ -10709,6 +10868,7 @@ >>> DeclsInPrototypeScope.push_back(Enum); >>> >>> CheckForUniqueEnumValues(*this, Elements, NumElements, Enum, >>> EnumType); >>> + CheckForDuplicateEnumValues(*this, Elements, NumElements, Enum, >>> EnumType); >>> } >>> >>> Decl *Sema::ActOnFileScopeAsmDecl(Expr *expr, >>> >>> >>> I know this may all be nit-picky, but I really think trying to reduce >>> the malloc() traffic is worth looking at to get a real understanding of the >>> performance improvement that can be found here. >>> >>> Thanks for forging ahead on this. >>> _______________________________________________ >>> cfe-commits mailing list >>> [email protected] >>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits >>> >>> >>> >> <duplicate-enum-densemap2.patch> >> >> >> > >
_______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
