Hi Richard,

If I read these numbers correctly, the hash table algorithm (with O(n) 
performance) takes about 1.6-2% percent more than the control for runs 1-3, and 
hardly anything noticeable for the clang code base.  Were runs 1-3 used in your 
earlier measurements, where the sorting-based approach took about ~4% longer 
(or is that not the correct number)?  Just wanted an idea of where we are 
compared to the earlier measurements.

Ted

On Aug 28, 2012, at 3:53 PM, Richard Trieu <[email protected]> wrote:

> Timing information:
> Three clangs were run, clang with no changes (control), duplicate enum with 
> PointerUnion (most recent patch), duplicate enum with DenseMap without 
> PointerUnion (next most recent patch).  Each run with -fsyntax-only and 
> -Wduplicate-enum for modified clangs.  Runs 1, 2, and 3 are files with only 
> enums.  Run 4 is a preprocessed Clang.
> 
> Key:
> name: Average (Min-Max)
> 
> Run1:
> Control: 13.763 (13.66-14.14)
> PointerUnion: 14.046 (13.94-14.16)
> DenseMap: 14.304 (14.24-14.39)
> 
> Run2:
> Control: 20.189 (20.1-20.31)
> PointerUnion: 20.514 (20.37-20.6)
> DenseMap: 20.635 (20.56-20.7)
> 
> Run3:
> Control: 26.715 (26.66-26.8)
> PointerUnion: 26.928 (26.8-27.12)
> DenseMap: 27.13 (27.05-27.22)
> 
> Run4:
> Control: 29.686 (28.98-30.39)
> PointerUnion: 29.706 (28.73-30.69)
> DenseMap: 29.952 (29.3-30.63)
> 
> On Tue, Aug 28, 2012 at 2:12 PM, Ted Kremenek <[email protected]> wrote:
> How fast?  (particularly compared to the first implementation, and relative 
> to -fsyntax-only)
> 
> On Aug 28, 2012, at 11:48 AM, Richard Trieu <[email protected]> wrote:
> 
>> New patch with PointerUnion and DenseMap is slightly faster than the 
>> previous DenseMap patch.
>> 
>> On Mon, Aug 27, 2012 at 9:51 PM, Ted Kremenek <[email protected]> wrote:
>> Thanks!  Quick question before I review it in more details: what is the 
>> performance characteristics of this patch compared to the others?
>> 
>> On Aug 27, 2012, at 11:35 AM, Richard Trieu <[email protected]> wrote:
>> 
>>> Incorporated most of the suggestions into this patch.  Still using a double 
>>> pass over the constants for the reasons outlined below.
>>> 
>>> On Fri, Aug 17, 2012 at 10:00 PM, Ted Kremenek <[email protected]> wrote:
>>> BTW, I wrote this two days ago.  For some reason my mail client didn't send 
>>> it out until now.  My apologies for the delay.
>>> No worries.  Had some issues that cropped up on template diffing that took 
>>> my time. 
>>> 
>>> On Aug 15, 2012, at 10:11 PM, Ted Kremenek <[email protected]> wrote:
>>> 
>>>> On Aug 15, 2012, at 6:12 PM, Richard Trieu <[email protected]> wrote:
>>>> 
>>>>> On Tue, Aug 14, 2012 at 9:48 PM, Ted Kremenek <[email protected]> wrote:
>>>>> On Aug 14, 2012, at 2:32 PM, Richard Trieu <[email protected]> wrote:
>>>>> 
>>>>>> At a high level, I honestly find this logic to be more complicated than 
>>>>>> I would have expected.  The sorting seems unnecessary, and will report 
>>>>>> diagnostics in an unnatural order (first based on enum constant value, 
>>>>>> then on declaration order).  A straight linear pass seems more naturally 
>>>>>> to me, and DenseMap is very efficient.
>>>>>> Is there a comparison between the different containers in LLVM and the 
>>>>>> STL containers?
>>>>> 
>>>>> This is a reasonable place to start:
>>>>> 
>>>>>   http://llvm.org/docs/ProgrammersManual.html#ds_map
>>>>> 
>>>>> The key with DenseMap is that it is probed hashtable.  There is one big 
>>>>> allocation for the entire table, instead of a bunch of buckets.  When 
>>>>> applicable, it can be very fast, and feels like the right data structure 
>>>>> to use here.
>>>>> 
>>>>> Duplicate enum detection, now with DenseMap.  The DenseMap maps a int64_t 
>>>>> to a vector pointer.  0 and 1 were special keys for the DenseMap, so two 
>>>>> separate pointers special cased for them.   The vectors pointers are 
>>>>> stored in another vector in declaration order.  One pass is made over the 
>>>>> enums to find ones without initializers.  These are used to create 
>>>>> vectors.  A second pass through the enums populates the vectors.  
>>>>> Finally, a pass over the vector of vectors is used to generate all the 
>>>>> warnings and notes.
>>>>> 
>>>>> Run time is fairly consistent with the sorted vector implementation, 
>>>>> which is max %3 difference against control.
>>>>> <duplicate-enum-densemap.patch>
>>>> 
>>>> Thanks for working on this.  My main concern is this patch now has a lot 
>>>> of unnecessary malloc() traffic, which will certainly slow it down.  
>>>> Comments inline:
>>>> 
>>>>> +
>>>>> +static int64_t GetInt64(const llvm::APSInt& Val) {
>>>>> +  return  Val.isSigned() ? Val.getSExtValue() : Val.getZExtValue();
>>>>> +}
>>>>> +
>>>>> +struct DenseMapInfoint64_t {
>>>>> +  static int64_t getEmptyKey() { return 0; }
>>>>> +  static int64_t getTombstoneKey() { return 1; }
>>>>> +  static unsigned getHashValue(const int64_t Val) {
>>>>> +    return (unsigned)(Val * 37);
>>>>> +  }
>>>>> +  static bool isEqual(const int64_t& LHS, const int64_t& RHS) {
>>>>> +    return LHS == RHS;
>>>>> +  }
>>>>> +};
>>>> 
>>>> This trait class doesn't look like it was actually used.  The DenseMap 
>>>> below just uses the default trait for int64_t.
>>>> 
>>>> I also still think we can so something a bit smarter here.  What I think 
>>>> we need to distinguish between is whether or not a constant has appeared 
>>>> more than once.  We're saving a bit of memory on the keys, but spending 
>>>> that savings elsewhere when we allocate the vectors unconditionally for 
>>>> each constant.
>>>> 
>>>>> +
>>>>> +// Emits a warning when an element is implicitly set a value that
>>>>> +// a previous element has already been set to.
>>>>> +static void CheckForDuplicateEnumValues(Sema &S, Decl **Elements,
>>>>> +                                        unsigned NumElements, EnumDecl 
>>>>> *Enum,
>>>>> +                                        QualType EnumType) {
>>>>> +  if (S.Diags.getDiagnosticLevel(diag::warn_duplicate_enum_values,
>>>>> +                                 Enum->getLocation()) ==
>>>>> +      DiagnosticsEngine::Ignored)
>>>>> +    return;
>>>>> +  // Avoid anonymous enums
>>>>> +  if (!Enum->getIdentifier())
>>>>> +    return;
>>>>> +
>>>>> +  // Only check for small enums.
>>>>> +  if (Enum->getNumPositiveBits() > 63 || Enum->getNumNegativeBits() > 64)
>>>>> +    return;
>>>>> +
>>>>> +  typedef llvm::SmallVector<EnumConstantDecl*, 4> SameValueVector;
>>>>> +  typedef llvm::DenseMap<int64_t, SameValueVector*> ValueToVectorMap;
>>>>> +  typedef llvm::SmallVector<SameValueVector*, 10> DoubleVector;
>>>>> +  ValueToVectorMap EnumMap;
>>>>> +  DoubleVector EnumVector;
>>>>> +  SameValueVector *ZeroVector = 0, *OneVector = 0;
>>>> 
>>>> It took me a while to understand what this was doing, so I feel it could 
>>>> really benefit from a comment.  This also appears to result in a ton of 
>>>> malloc traffic below.  Here's my suggestion:
>>>> 
>>>>   typedef llvm::SmallVector<EnumConstantDecl*, 3> ECDVector;
>>>>   typedef llvm::SmallVector<ECDVector *, 3> DuplicatesVector;
>>>> 
>>>>   typedef llvm::PointerUnion<EnumConstantDecl*, ECDVector *> DeclOrVector;
>>>>   typedef llvm::DenseMap<int64_t, DeclOrVector> ValueToVectorMap;
>>>> 
>>>>   DuplicatesVector DupVector;
>>>>   ValueToVectorMap EnumMap;
>>>> 
>>>> The trick here is that the DenseMap maps from a constant to the first 
>>>> EnumConstantDecl it encounters.  Only if we encounter a second 
>>>> EnumConstantDecl with the same enum value do we pay the cost of allocating 
>>>> another vector.  This will drastically optimize in the common case, as 
>>>> calling malloc() is really slow.  Right now the code appears to be doing a 
>>>> malloc() for every enum constant, which is going to really penalize us 
>>>> here.
>>>> 
>>>>> +
>>>>> +  for (unsigned i = 0; i < NumElements; ++i) {
>>>>> +    EnumConstantDecl *ECD = cast<EnumConstantDecl>(Elements[i]);
>>>>> +    if (!ECD) {
>>>>> +      for (DoubleVector::iterator I = EnumVector.begin(), E = 
>>>>> EnumVector.end();
>>>>> +           I != E; ++I)
>>>>> +        delete *I;
>>>>> +      return;
>>>>> +    }
>>>> 
>>>> I don't quite understand this loop through DoubleVector here, but it looks 
>>>> like logic in case we want to return early and cleanup.  Is there a case 
>>>> where the EnumConstantDecl can be null?
>>> 
>>> According to ActOnEnumBody, EnumConstantDecl is null if a diagnostic has 
>>> previously been emitted for the constant.  Since the enum is possibly 
>>> ill-formed, skip checking it.
>>>> 
>>>>> +
>>>>> +    if (ECD->getInitExpr())
>>>>> +      continue;
>>>>> +
>>>>> +    int64_t Val = GetInt64(ECD->getInitVal());
>>>>> +
>>>> 
>>>> Looks good.
>>>> 
>>>>> +    if (Val == 0) {
>>>>> +      if (ZeroVector) continue;
>>>>> +      ZeroVector = new SameValueVector();
>>>>> +      ZeroVector->push_back(ECD);
>>>>> +      EnumVector.push_back(ZeroVector);
>>>>> +    } else if (Val == 1) {
>>>>> +      if (OneVector) continue;
>>>>> +      OneVector = new SameValueVector();
>>>>> +      OneVector->push_back(ECD);
>>>>> +      EnumVector.push_back(OneVector);
>>>>> +    } else {
>>>>> +      if (EnumMap.find(Val) != EnumMap.end())
>>>>> +        continue;
>>>>> +      SameValueVector *ValueVector = new SameValueVector();
>>>>> +      ValueVector->push_back(ECD);
>>>>> +      EnumVector.push_back(ValueVector);
>>>>> +      EnumMap.insert(std::make_pair(Val, ValueVector));
>>>> 
>>>> The "find()" followed by the "insert()" is wasteful.  It results in two 
>>>> lookups to the hash table when we could have just used one.  More on that 
>>>> later.
>>>> 
>>>>> +    }
>>>>> +  }
>>>> 
>>>> IMO, this looks like a lot of complexity just to handle the fact that 0 
>>>> and 1 are special values for the DenseMap.  I don't really see this as the 
>>>> right tradeoff; the code is more complicated with marginal impact on 
>>>> memory usage or performance.
>>>> 
>>>> If you humor me for a bit, consider using something else for the key, e.g.:
>>>> 
>>>> struct DupKey {
>>>>   int64_t val;
>>>>   bool isTombstoneOrEmptyKey;
>>>> };
>>>> 
>>>> The idea is if 'isTombStoneOrEmptyKey' is true, we can use val = 0 or val 
>>>> = 1 to represent empty keys or tombstone entries.  Otherwise, it's an 
>>>> int64_t, with the full range of values.  We can define a DenseMap trait to 
>>>> do the right thing.  Yes, this costs a tiny bit more in storage, but it 
>>>> allows the data structure to handle the complete set of values in your 
>>>> domain, instead of resorting to complicating the core algorithm.  What I 
>>>> see here now is the same code essentially duplicated twice, which makes it 
>>>> harder to read and more error prone.
>>>> 
>>>> If we use DupKey as our key for the DenseMap, we can instead do something 
>>>> like this:
>>>> 
>>>>    DeclOrVector &entry = EnumMap[Val];  // Use default construction of 
>>>> 'entry'.
>>>>    // Is the first time we encountered this constant?
>>>>    if (entry.isNull()) {
>>>>      entry = ECD;
>>>>      continue;
>>>>    }
>>>>    // Is this the second time we encountered this constant?  If so,
>>>>    // push the previous decl encountered and the one just encountered
>>>>    // to a vector of duplicates.
>>>>    if (EnumConstantDecl *D = entry.dyn_cast<EnumConstantDecl*>()) {
>>>>      ECDVector *Vec = new ECDVector();
>>>>      Vec->push_back(D);
>>>>      Vec->push_back(ECD);
>>>>      
>>>>      // Update the entry to refer to the duplicates.
>>>>      entry = Vec;
>>>> 
>>>>      // Store the duplicates in a vector we can consult later for
>>>>      // quick emission of diagnostics.
>>>>      DupVector.push_back(Vec);
>>>> 
>>>>      // On to the next constant.
>>>>      continue;
>>>>    }
>>>>    // Is this the third (or greater) time we encountered the constant?  If 
>>>> so,
>>>>    // continue to add it to the existing vector.
>>>>    ECDVector *Vec = entry.get<ECDVector*>();
>>>>    Vec->push_back(ECD);
>>>> 
>>>> 
>>>> With this code, we only allocate memory (beyond the DenseMap) when we 
>>>> encounter a duplicate that would be worth reporting.  In the common case, 
>>>> this savings in malloc traffic should be noticeable.
>>>> 
>>>> Notice also that I used:
>>>> 
>>>>      DeclOrVector &entry = EnumMap[Val];  // Use default construction of 
>>>> 'entry'.
>>>> 
>>>> This results in a single lookup in the hashtable.  Since we plan on adding 
>>>> a value for a key no matter what, by using this idiom we allow the 
>>>> DenseMap to default construct an entry if it doesn't exist.  This results 
>>>> in a single hashtable lookup, from which we can modify the value in place. 
>>>>  This is obviously faster than doing a hashtable lookup twice. 
>>>> 
>>>>> +
>>>>> +  for (unsigned i = 0; i < NumElements; ++i) {
>>>>> +    EnumConstantDecl *ECD = cast<EnumConstantDecl>(Elements[i]);
>>>>> +    if (!ValidDuplicateEnum(ECD, Enum))
>>>>> +      continue;
>>>>> +
>>>>> +    int64_t Val = GetInt64(ECD->getInitVal());
>>>>> +
>>>>> +    if (Val == 0) {
>>>>> +      if (!ZeroVector || *ZeroVector->begin() == ECD)
>>>>> +        continue;
>>>>> +      ZeroVector->push_back(ECD);
>>>>> +    } else if (Val == 1) {
>>>>> +      if (!OneVector || *OneVector->begin() == ECD)
>>>>> +        continue;
>>>>> +      OneVector->push_back(ECD);
>>>>> +    } else {
>>>>> +      ValueToVectorMap::iterator I = EnumMap.find(Val);
>>>>> +      if (I == EnumMap.end())
>>>>> +        continue;
>>>>> +      SameValueVector *V = I->second;
>>>>> +      if (*V->begin() == ECD)
>>>>> +        continue;
>>>>> +      V->push_back(ECD);
>>>>> +    }
>>>>> +  }
>>>> 
>>>> This second loop looks unnecessary.  I think we can do everything we need 
>>>> to count duplicates with one loop.  Of course the ValidDuplicateEnum() 
>>>> would need to be hoisted to the first loop.
>>> 
>>> Using two traverses allows two things to happen.  One, the first element in 
>>> the ECDVector will not have an initializer and will work with the warning.  
>>> Otherwise, the vector needs to be searched for a proper enum constant to 
>>> use.  Two, it prevents unneeded creation of ECDVectors.  If we have enum A 
>>> { A1 = 2, A2 = 2, A3 = 1, A4 = 1, A5}; vectors for values 1 and 2 are 
>>> created using a single pass while only a vector for 2 will be created using 
>>> a double pass.
>>>> 
>>>>> +
>>>>> +  for (DoubleVector::iterator DoubleVectorIter = EnumVector.begin(),
>>>>> +                              DoubleVectorEnd = EnumVector.end();
>>>>> +       DoubleVectorIter != DoubleVectorEnd; ++DoubleVectorIter) {
>>>>> +    SameValueVector *V = *DoubleVectorIter;
>>>>> +    if (V->size() == 1)
>>>>> +      continue;
>>>>> +
>>>>> +    SameValueVector::iterator I = V->begin();
>>>>> +    S.Diag((*I)->getLocation(), diag::warn_duplicate_enum_values)
>>>>> +      << (*I)->getName() << (*I)->getInitVal().toString(10)
>>>>> +      << (*I)->getSourceRange();
>>>>> +    ++I;
>>>>> +    for (SameValueVector::iterator E = V->end(); I != E; ++I)
>>>>> +      S.Diag((*I)->getLocation(), diag::note_duplicate_element)
>>>>> +        << (*I)->getName() << (*I)->getInitVal().toString(10)
>>>>> +        << (*I)->getSourceRange();
>>>>> +    delete V;
>>>>> +  }
>>>> 
>>>> 
>>>> This is more or less the same, essentially it becomes:
>>>> 
>>>> for (DuplicateVector::iterator I = DupVector.begin(), E = DupVector.end(); 
>>>> I != E; ++I) {
>>>>    ECDVector *Vec = *I;
>>>>    // do the diagnostic logic ...
>>>>    delete *I;
>>>> }
>>>> 
>>>> Note that with my suggestions the vector has size on order of the number 
>>>> of duplicate constants, not the number of total constants.  If there are 
>>>> no duplicates, no work is required (including free'ing memory).
>>>> 
>>>>> +}
>>>>> +
>>>>>  void Sema::ActOnEnumBody(SourceLocation EnumLoc, SourceLocation 
>>>>> LBraceLoc,
>>>>>                           SourceLocation RBraceLoc, Decl *EnumDeclX,
>>>>>                           Decl **Elements, unsigned NumElements,
>>>>> @@ -10709,6 +10868,7 @@
>>>>>      DeclsInPrototypeScope.push_back(Enum);
>>>>>  
>>>>>    CheckForUniqueEnumValues(*this, Elements, NumElements, Enum, EnumType);
>>>>> +  CheckForDuplicateEnumValues(*this, Elements, NumElements, Enum, 
>>>>> EnumType);
>>>>>  }
>>>>>  
>>>>>  Decl *Sema::ActOnFileScopeAsmDecl(Expr *expr,
>>>> 
>>>> I know this may all be nit-picky, but I really think trying to reduce the 
>>>> malloc() traffic is worth looking at to get a real understanding of the 
>>>> performance improvement that can be found here.
>>>> 
>>>> Thanks for forging ahead on this.
>>>> _______________________________________________
>>>> cfe-commits mailing list
>>>> [email protected]
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
>>> 
>>> 
>>> <duplicate-enum-densemap2.patch>
>> 
>> 
> 
> 

_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Reply via email to