Hi Richard, If I read these numbers correctly, the hash table algorithm (with O(n) performance) takes about 1.6-2% percent more than the control for runs 1-3, and hardly anything noticeable for the clang code base. Were runs 1-3 used in your earlier measurements, where the sorting-based approach took about ~4% longer (or is that not the correct number)? Just wanted an idea of where we are compared to the earlier measurements.
Ted On Aug 28, 2012, at 3:53 PM, Richard Trieu <[email protected]> wrote: > Timing information: > Three clangs were run, clang with no changes (control), duplicate enum with > PointerUnion (most recent patch), duplicate enum with DenseMap without > PointerUnion (next most recent patch). Each run with -fsyntax-only and > -Wduplicate-enum for modified clangs. Runs 1, 2, and 3 are files with only > enums. Run 4 is a preprocessed Clang. > > Key: > name: Average (Min-Max) > > Run1: > Control: 13.763 (13.66-14.14) > PointerUnion: 14.046 (13.94-14.16) > DenseMap: 14.304 (14.24-14.39) > > Run2: > Control: 20.189 (20.1-20.31) > PointerUnion: 20.514 (20.37-20.6) > DenseMap: 20.635 (20.56-20.7) > > Run3: > Control: 26.715 (26.66-26.8) > PointerUnion: 26.928 (26.8-27.12) > DenseMap: 27.13 (27.05-27.22) > > Run4: > Control: 29.686 (28.98-30.39) > PointerUnion: 29.706 (28.73-30.69) > DenseMap: 29.952 (29.3-30.63) > > On Tue, Aug 28, 2012 at 2:12 PM, Ted Kremenek <[email protected]> wrote: > How fast? (particularly compared to the first implementation, and relative > to -fsyntax-only) > > On Aug 28, 2012, at 11:48 AM, Richard Trieu <[email protected]> wrote: > >> New patch with PointerUnion and DenseMap is slightly faster than the >> previous DenseMap patch. >> >> On Mon, Aug 27, 2012 at 9:51 PM, Ted Kremenek <[email protected]> wrote: >> Thanks! Quick question before I review it in more details: what is the >> performance characteristics of this patch compared to the others? >> >> On Aug 27, 2012, at 11:35 AM, Richard Trieu <[email protected]> wrote: >> >>> Incorporated most of the suggestions into this patch. Still using a double >>> pass over the constants for the reasons outlined below. >>> >>> On Fri, Aug 17, 2012 at 10:00 PM, Ted Kremenek <[email protected]> wrote: >>> BTW, I wrote this two days ago. For some reason my mail client didn't send >>> it out until now. My apologies for the delay. >>> No worries. Had some issues that cropped up on template diffing that took >>> my time. >>> >>> On Aug 15, 2012, at 10:11 PM, Ted Kremenek <[email protected]> wrote: >>> >>>> On Aug 15, 2012, at 6:12 PM, Richard Trieu <[email protected]> wrote: >>>> >>>>> On Tue, Aug 14, 2012 at 9:48 PM, Ted Kremenek <[email protected]> wrote: >>>>> On Aug 14, 2012, at 2:32 PM, Richard Trieu <[email protected]> wrote: >>>>> >>>>>> At a high level, I honestly find this logic to be more complicated than >>>>>> I would have expected. The sorting seems unnecessary, and will report >>>>>> diagnostics in an unnatural order (first based on enum constant value, >>>>>> then on declaration order). A straight linear pass seems more naturally >>>>>> to me, and DenseMap is very efficient. >>>>>> Is there a comparison between the different containers in LLVM and the >>>>>> STL containers? >>>>> >>>>> This is a reasonable place to start: >>>>> >>>>> http://llvm.org/docs/ProgrammersManual.html#ds_map >>>>> >>>>> The key with DenseMap is that it is probed hashtable. There is one big >>>>> allocation for the entire table, instead of a bunch of buckets. When >>>>> applicable, it can be very fast, and feels like the right data structure >>>>> to use here. >>>>> >>>>> Duplicate enum detection, now with DenseMap. The DenseMap maps a int64_t >>>>> to a vector pointer. 0 and 1 were special keys for the DenseMap, so two >>>>> separate pointers special cased for them. The vectors pointers are >>>>> stored in another vector in declaration order. One pass is made over the >>>>> enums to find ones without initializers. These are used to create >>>>> vectors. A second pass through the enums populates the vectors. >>>>> Finally, a pass over the vector of vectors is used to generate all the >>>>> warnings and notes. >>>>> >>>>> Run time is fairly consistent with the sorted vector implementation, >>>>> which is max %3 difference against control. >>>>> <duplicate-enum-densemap.patch> >>>> >>>> Thanks for working on this. My main concern is this patch now has a lot >>>> of unnecessary malloc() traffic, which will certainly slow it down. >>>> Comments inline: >>>> >>>>> + >>>>> +static int64_t GetInt64(const llvm::APSInt& Val) { >>>>> + return Val.isSigned() ? Val.getSExtValue() : Val.getZExtValue(); >>>>> +} >>>>> + >>>>> +struct DenseMapInfoint64_t { >>>>> + static int64_t getEmptyKey() { return 0; } >>>>> + static int64_t getTombstoneKey() { return 1; } >>>>> + static unsigned getHashValue(const int64_t Val) { >>>>> + return (unsigned)(Val * 37); >>>>> + } >>>>> + static bool isEqual(const int64_t& LHS, const int64_t& RHS) { >>>>> + return LHS == RHS; >>>>> + } >>>>> +}; >>>> >>>> This trait class doesn't look like it was actually used. The DenseMap >>>> below just uses the default trait for int64_t. >>>> >>>> I also still think we can so something a bit smarter here. What I think >>>> we need to distinguish between is whether or not a constant has appeared >>>> more than once. We're saving a bit of memory on the keys, but spending >>>> that savings elsewhere when we allocate the vectors unconditionally for >>>> each constant. >>>> >>>>> + >>>>> +// Emits a warning when an element is implicitly set a value that >>>>> +// a previous element has already been set to. >>>>> +static void CheckForDuplicateEnumValues(Sema &S, Decl **Elements, >>>>> + unsigned NumElements, EnumDecl >>>>> *Enum, >>>>> + QualType EnumType) { >>>>> + if (S.Diags.getDiagnosticLevel(diag::warn_duplicate_enum_values, >>>>> + Enum->getLocation()) == >>>>> + DiagnosticsEngine::Ignored) >>>>> + return; >>>>> + // Avoid anonymous enums >>>>> + if (!Enum->getIdentifier()) >>>>> + return; >>>>> + >>>>> + // Only check for small enums. >>>>> + if (Enum->getNumPositiveBits() > 63 || Enum->getNumNegativeBits() > 64) >>>>> + return; >>>>> + >>>>> + typedef llvm::SmallVector<EnumConstantDecl*, 4> SameValueVector; >>>>> + typedef llvm::DenseMap<int64_t, SameValueVector*> ValueToVectorMap; >>>>> + typedef llvm::SmallVector<SameValueVector*, 10> DoubleVector; >>>>> + ValueToVectorMap EnumMap; >>>>> + DoubleVector EnumVector; >>>>> + SameValueVector *ZeroVector = 0, *OneVector = 0; >>>> >>>> It took me a while to understand what this was doing, so I feel it could >>>> really benefit from a comment. This also appears to result in a ton of >>>> malloc traffic below. Here's my suggestion: >>>> >>>> typedef llvm::SmallVector<EnumConstantDecl*, 3> ECDVector; >>>> typedef llvm::SmallVector<ECDVector *, 3> DuplicatesVector; >>>> >>>> typedef llvm::PointerUnion<EnumConstantDecl*, ECDVector *> DeclOrVector; >>>> typedef llvm::DenseMap<int64_t, DeclOrVector> ValueToVectorMap; >>>> >>>> DuplicatesVector DupVector; >>>> ValueToVectorMap EnumMap; >>>> >>>> The trick here is that the DenseMap maps from a constant to the first >>>> EnumConstantDecl it encounters. Only if we encounter a second >>>> EnumConstantDecl with the same enum value do we pay the cost of allocating >>>> another vector. This will drastically optimize in the common case, as >>>> calling malloc() is really slow. Right now the code appears to be doing a >>>> malloc() for every enum constant, which is going to really penalize us >>>> here. >>>> >>>>> + >>>>> + for (unsigned i = 0; i < NumElements; ++i) { >>>>> + EnumConstantDecl *ECD = cast<EnumConstantDecl>(Elements[i]); >>>>> + if (!ECD) { >>>>> + for (DoubleVector::iterator I = EnumVector.begin(), E = >>>>> EnumVector.end(); >>>>> + I != E; ++I) >>>>> + delete *I; >>>>> + return; >>>>> + } >>>> >>>> I don't quite understand this loop through DoubleVector here, but it looks >>>> like logic in case we want to return early and cleanup. Is there a case >>>> where the EnumConstantDecl can be null? >>> >>> According to ActOnEnumBody, EnumConstantDecl is null if a diagnostic has >>> previously been emitted for the constant. Since the enum is possibly >>> ill-formed, skip checking it. >>>> >>>>> + >>>>> + if (ECD->getInitExpr()) >>>>> + continue; >>>>> + >>>>> + int64_t Val = GetInt64(ECD->getInitVal()); >>>>> + >>>> >>>> Looks good. >>>> >>>>> + if (Val == 0) { >>>>> + if (ZeroVector) continue; >>>>> + ZeroVector = new SameValueVector(); >>>>> + ZeroVector->push_back(ECD); >>>>> + EnumVector.push_back(ZeroVector); >>>>> + } else if (Val == 1) { >>>>> + if (OneVector) continue; >>>>> + OneVector = new SameValueVector(); >>>>> + OneVector->push_back(ECD); >>>>> + EnumVector.push_back(OneVector); >>>>> + } else { >>>>> + if (EnumMap.find(Val) != EnumMap.end()) >>>>> + continue; >>>>> + SameValueVector *ValueVector = new SameValueVector(); >>>>> + ValueVector->push_back(ECD); >>>>> + EnumVector.push_back(ValueVector); >>>>> + EnumMap.insert(std::make_pair(Val, ValueVector)); >>>> >>>> The "find()" followed by the "insert()" is wasteful. It results in two >>>> lookups to the hash table when we could have just used one. More on that >>>> later. >>>> >>>>> + } >>>>> + } >>>> >>>> IMO, this looks like a lot of complexity just to handle the fact that 0 >>>> and 1 are special values for the DenseMap. I don't really see this as the >>>> right tradeoff; the code is more complicated with marginal impact on >>>> memory usage or performance. >>>> >>>> If you humor me for a bit, consider using something else for the key, e.g.: >>>> >>>> struct DupKey { >>>> int64_t val; >>>> bool isTombstoneOrEmptyKey; >>>> }; >>>> >>>> The idea is if 'isTombStoneOrEmptyKey' is true, we can use val = 0 or val >>>> = 1 to represent empty keys or tombstone entries. Otherwise, it's an >>>> int64_t, with the full range of values. We can define a DenseMap trait to >>>> do the right thing. Yes, this costs a tiny bit more in storage, but it >>>> allows the data structure to handle the complete set of values in your >>>> domain, instead of resorting to complicating the core algorithm. What I >>>> see here now is the same code essentially duplicated twice, which makes it >>>> harder to read and more error prone. >>>> >>>> If we use DupKey as our key for the DenseMap, we can instead do something >>>> like this: >>>> >>>> DeclOrVector &entry = EnumMap[Val]; // Use default construction of >>>> 'entry'. >>>> // Is the first time we encountered this constant? >>>> if (entry.isNull()) { >>>> entry = ECD; >>>> continue; >>>> } >>>> // Is this the second time we encountered this constant? If so, >>>> // push the previous decl encountered and the one just encountered >>>> // to a vector of duplicates. >>>> if (EnumConstantDecl *D = entry.dyn_cast<EnumConstantDecl*>()) { >>>> ECDVector *Vec = new ECDVector(); >>>> Vec->push_back(D); >>>> Vec->push_back(ECD); >>>> >>>> // Update the entry to refer to the duplicates. >>>> entry = Vec; >>>> >>>> // Store the duplicates in a vector we can consult later for >>>> // quick emission of diagnostics. >>>> DupVector.push_back(Vec); >>>> >>>> // On to the next constant. >>>> continue; >>>> } >>>> // Is this the third (or greater) time we encountered the constant? If >>>> so, >>>> // continue to add it to the existing vector. >>>> ECDVector *Vec = entry.get<ECDVector*>(); >>>> Vec->push_back(ECD); >>>> >>>> >>>> With this code, we only allocate memory (beyond the DenseMap) when we >>>> encounter a duplicate that would be worth reporting. In the common case, >>>> this savings in malloc traffic should be noticeable. >>>> >>>> Notice also that I used: >>>> >>>> DeclOrVector &entry = EnumMap[Val]; // Use default construction of >>>> 'entry'. >>>> >>>> This results in a single lookup in the hashtable. Since we plan on adding >>>> a value for a key no matter what, by using this idiom we allow the >>>> DenseMap to default construct an entry if it doesn't exist. This results >>>> in a single hashtable lookup, from which we can modify the value in place. >>>> This is obviously faster than doing a hashtable lookup twice. >>>> >>>>> + >>>>> + for (unsigned i = 0; i < NumElements; ++i) { >>>>> + EnumConstantDecl *ECD = cast<EnumConstantDecl>(Elements[i]); >>>>> + if (!ValidDuplicateEnum(ECD, Enum)) >>>>> + continue; >>>>> + >>>>> + int64_t Val = GetInt64(ECD->getInitVal()); >>>>> + >>>>> + if (Val == 0) { >>>>> + if (!ZeroVector || *ZeroVector->begin() == ECD) >>>>> + continue; >>>>> + ZeroVector->push_back(ECD); >>>>> + } else if (Val == 1) { >>>>> + if (!OneVector || *OneVector->begin() == ECD) >>>>> + continue; >>>>> + OneVector->push_back(ECD); >>>>> + } else { >>>>> + ValueToVectorMap::iterator I = EnumMap.find(Val); >>>>> + if (I == EnumMap.end()) >>>>> + continue; >>>>> + SameValueVector *V = I->second; >>>>> + if (*V->begin() == ECD) >>>>> + continue; >>>>> + V->push_back(ECD); >>>>> + } >>>>> + } >>>> >>>> This second loop looks unnecessary. I think we can do everything we need >>>> to count duplicates with one loop. Of course the ValidDuplicateEnum() >>>> would need to be hoisted to the first loop. >>> >>> Using two traverses allows two things to happen. One, the first element in >>> the ECDVector will not have an initializer and will work with the warning. >>> Otherwise, the vector needs to be searched for a proper enum constant to >>> use. Two, it prevents unneeded creation of ECDVectors. If we have enum A >>> { A1 = 2, A2 = 2, A3 = 1, A4 = 1, A5}; vectors for values 1 and 2 are >>> created using a single pass while only a vector for 2 will be created using >>> a double pass. >>>> >>>>> + >>>>> + for (DoubleVector::iterator DoubleVectorIter = EnumVector.begin(), >>>>> + DoubleVectorEnd = EnumVector.end(); >>>>> + DoubleVectorIter != DoubleVectorEnd; ++DoubleVectorIter) { >>>>> + SameValueVector *V = *DoubleVectorIter; >>>>> + if (V->size() == 1) >>>>> + continue; >>>>> + >>>>> + SameValueVector::iterator I = V->begin(); >>>>> + S.Diag((*I)->getLocation(), diag::warn_duplicate_enum_values) >>>>> + << (*I)->getName() << (*I)->getInitVal().toString(10) >>>>> + << (*I)->getSourceRange(); >>>>> + ++I; >>>>> + for (SameValueVector::iterator E = V->end(); I != E; ++I) >>>>> + S.Diag((*I)->getLocation(), diag::note_duplicate_element) >>>>> + << (*I)->getName() << (*I)->getInitVal().toString(10) >>>>> + << (*I)->getSourceRange(); >>>>> + delete V; >>>>> + } >>>> >>>> >>>> This is more or less the same, essentially it becomes: >>>> >>>> for (DuplicateVector::iterator I = DupVector.begin(), E = DupVector.end(); >>>> I != E; ++I) { >>>> ECDVector *Vec = *I; >>>> // do the diagnostic logic ... >>>> delete *I; >>>> } >>>> >>>> Note that with my suggestions the vector has size on order of the number >>>> of duplicate constants, not the number of total constants. If there are >>>> no duplicates, no work is required (including free'ing memory). >>>> >>>>> +} >>>>> + >>>>> void Sema::ActOnEnumBody(SourceLocation EnumLoc, SourceLocation >>>>> LBraceLoc, >>>>> SourceLocation RBraceLoc, Decl *EnumDeclX, >>>>> Decl **Elements, unsigned NumElements, >>>>> @@ -10709,6 +10868,7 @@ >>>>> DeclsInPrototypeScope.push_back(Enum); >>>>> >>>>> CheckForUniqueEnumValues(*this, Elements, NumElements, Enum, EnumType); >>>>> + CheckForDuplicateEnumValues(*this, Elements, NumElements, Enum, >>>>> EnumType); >>>>> } >>>>> >>>>> Decl *Sema::ActOnFileScopeAsmDecl(Expr *expr, >>>> >>>> I know this may all be nit-picky, but I really think trying to reduce the >>>> malloc() traffic is worth looking at to get a real understanding of the >>>> performance improvement that can be found here. >>>> >>>> Thanks for forging ahead on this. >>>> _______________________________________________ >>>> cfe-commits mailing list >>>> [email protected] >>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits >>> >>> >>> <duplicate-enum-densemap2.patch> >> >> > >
_______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
