Re: [cfe-commits] [Patch] -Wduplicate-enum which fixes PR6343

Richard Trieu Tue, 28 Aug 2012 15:55:13 -0700

Timing information:
Three clangs were run, clang with no changes (control), duplicate enum with
PointerUnion (most recent patch), duplicate enum with DenseMap without
PointerUnion (next most recent patch).  Each run with -fsyntax-only and
-Wduplicate-enum for modified clangs.  Runs 1, 2, and 3 are files with only
enums.  Run 4 is a preprocessed Clang.


Key:
name: Average (Min-Max)

Run1:
Control: 13.763 (13.66-14.14)
PointerUnion: 14.046 (13.94-14.16)
DenseMap: 14.304 (14.24-14.39)

Run2:
Control: 20.189 (20.1-20.31)
PointerUnion: 20.514 (20.37-20.6)
DenseMap: 20.635 (20.56-20.7)

Run3:
Control: 26.715 (26.66-26.8)
PointerUnion: 26.928 (26.8-27.12)
DenseMap: 27.13 (27.05-27.22)

Run4:
Control: 29.686 (28.98-30.39)
PointerUnion: 29.706 (28.73-30.69)
DenseMap: 29.952 (29.3-30.63)

On Tue, Aug 28, 2012 at 2:12 PM, Ted Kremenek <[email protected]> wrote:

> How fast?  (particularly compared to the first implementation, and
> relative to -fsyntax-only)
>
> On Aug 28, 2012, at 11:48 AM, Richard Trieu <[email protected]> wrote:
>
> New patch with PointerUnion and DenseMap is slightly faster than the
> previous DenseMap patch.
>
> On Mon, Aug 27, 2012 at 9:51 PM, Ted Kremenek <[email protected]> wrote:
>
>> Thanks!  Quick question before I review it in more details: what is the
>> performance characteristics of this patch compared to the others?
>>
>> On Aug 27, 2012, at 11:35 AM, Richard Trieu <[email protected]> wrote:
>>
>> Incorporated most of the suggestions into this patch.  Still using a
>> double pass over the constants for the reasons outlined below.
>>
>> On Fri, Aug 17, 2012 at 10:00 PM, Ted Kremenek <[email protected]>wrote:
>>
>>> BTW, I wrote this two days ago.  For some reason my mail client didn't
>>> send it out until now.  My apologies for the delay.
>>>
>> No worries.  Had some issues that cropped up on template diffing that
>> took my time.
>>
>>>
>>> On Aug 15, 2012, at 10:11 PM, Ted Kremenek <[email protected]> wrote:
>>>
>>> On Aug 15, 2012, at 6:12 PM, Richard Trieu <[email protected]> wrote:
>>>
>>> On Tue, Aug 14, 2012 at 9:48 PM, Ted Kremenek <[email protected]>wrote:
>>>
>>>> On Aug 14, 2012, at 2:32 PM, Richard Trieu <[email protected]> wrote:
>>>>
>>>>  At a high level, I honestly find this logic to be more complicated
>>>>> than I would have expected.  The sorting seems unnecessary, and will 
>>>>> report
>>>>> diagnostics in an unnatural order (first based on enum constant value, 
>>>>> then
>>>>> on declaration order).  A straight linear pass seems more naturally to me,
>>>>> and DenseMap is very efficient.
>>>>>
>>>> Is there a comparison between the different containers in LLVM and the
>>>> STL containers?
>>>>
>>>>
>>>> This is a reasonable place to start:
>>>>
>>>>   http://llvm.org/docs/ProgrammersManual.html#ds_map
>>>>
>>>> The key with DenseMap is that it is probed hashtable.  There is one big
>>>> allocation for the entire table, instead of a bunch of buckets.  When
>>>> applicable, it can be very fast, and feels like the right data structure to
>>>> use here.
>>>>
>>>
>>> Duplicate enum detection, now with DenseMap.  The DenseMap maps a
>>> int64_t to a vector pointer.  0 and 1 were special keys for the DenseMap,
>>> so two separate pointers special cased for them.   The vectors pointers are
>>> stored in another vector in declaration order.  One pass is made over the
>>> enums to find ones without initializers.  These are used to create vectors.
>>>  A second pass through the enums populates the vectors.  Finally, a pass
>>> over the vector of vectors is used to generate all the warnings and notes.
>>>
>>> Run time is fairly consistent with the sorted vector implementation,
>>> which is max %3 difference against control.
>>> <duplicate-enum-densemap.patch>
>>>
>>>
>>> Thanks for working on this.  My main concern is this patch now has a lot
>>> of unnecessary malloc() traffic, which will certainly slow it down.
>>>  Comments inline:
>>>
>>> +
>>> +static int64_t GetInt64(const llvm::APSInt& Val) {
>>> +  return  Val.isSigned() ? Val.getSExtValue() : Val.getZExtValue();
>>> +}
>>> +
>>> +struct DenseMapInfoint64_t {
>>> +  static int64_t getEmptyKey() { return 0; }
>>> +  static int64_t getTombstoneKey() { return 1; }
>>> +  static unsigned getHashValue(const int64_t Val) {
>>> +    return (unsigned)(Val * 37);
>>> +  }
>>> +  static bool isEqual(const int64_t& LHS, const int64_t& RHS) {
>>> +    return LHS == RHS;
>>> +  }
>>> +};
>>>
>>>
>>> This trait class doesn't look like it was actually used.  The DenseMap
>>> below just uses the default trait for int64_t.
>>>
>>> I also still think we can so something a bit smarter here.  What I think
>>> we need to distinguish between is whether or not a constant has appeared
>>> more than once.  We're saving a bit of memory on the keys, but spending
>>> that savings elsewhere when we allocate the vectors unconditionally for
>>> each constant.
>>>
>>> +
>>> +// Emits a warning when an element is implicitly set a value that
>>> +// a previous element has already been set to.
>>> +static void CheckForDuplicateEnumValues(Sema &S, Decl **Elements,
>>> +                                        unsigned NumElements, EnumDecl
>>> *Enum,
>>> +                                        QualType EnumType) {
>>> +  if (S.Diags.getDiagnosticLevel(diag::warn_duplicate_enum_values,
>>> +                                 Enum->getLocation()) ==
>>> +      DiagnosticsEngine::Ignored)
>>> +    return;
>>> +  // Avoid anonymous enums
>>> +  if (!Enum->getIdentifier())
>>> +    return;
>>> +
>>> +  // Only check for small enums.
>>> +  if (Enum->getNumPositiveBits() > 63 || Enum->getNumNegativeBits() >
>>> 64)
>>> +    return;
>>> +
>>> +  typedef llvm::SmallVector<EnumConstantDecl*, 4> SameValueVector;
>>> +  typedef llvm::DenseMap<int64_t, SameValueVector*> ValueToVectorMap;
>>> +  typedef llvm::SmallVector<SameValueVector*, 10> DoubleVector;
>>> +  ValueToVectorMap EnumMap;
>>> +  DoubleVector EnumVector;
>>> +  SameValueVector *ZeroVector = 0, *OneVector = 0;
>>>
>>>
>>> It took me a while to understand what this was doing, so I feel it could
>>> really benefit from a comment.  This also appears to result in a ton of
>>> malloc traffic below.  Here's my suggestion:
>>>
>>>   typedef llvm::SmallVector<EnumConstantDecl*, 3> ECDVector;
>>>   typedef llvm::SmallVector<ECDVector *, 3> DuplicatesVector;
>>>
>>>   typedef llvm::PointerUnion<EnumConstantDecl*, ECDVector *>
>>> DeclOrVector;
>>>   typedef llvm::DenseMap<int64_t, DeclOrVector> ValueToVectorMap;
>>>
>>>   DuplicatesVector DupVector;
>>>   ValueToVectorMap EnumMap;
>>>
>>> The trick here is that the DenseMap maps from a constant to the first
>>> EnumConstantDecl it encounters.  Only if we encounter a second
>>> EnumConstantDecl with the same enum value do we pay the cost of allocating
>>> another vector.  This will drastically optimize in the common case, as
>>> calling malloc() is really slow.  Right now the code appears to be doing a
>>> malloc() for every enum constant, which is going to really penalize us here.
>>>
>>> +
>>> +  for (unsigned i = 0; i < NumElements; ++i) {
>>> +    EnumConstantDecl *ECD = cast<EnumConstantDecl>(Elements[i]);
>>> +    if (!ECD) {
>>> +      for (DoubleVector::iterator I = EnumVector.begin(), E =
>>> EnumVector.end();
>>> +           I != E; ++I)
>>> +        delete *I;
>>> +      return;
>>> +    }
>>>
>>>
>>> I don't quite understand this loop through DoubleVector here, but it
>>> looks like logic in case we want to return early and cleanup.  Is there a
>>> case where the EnumConstantDecl can be null?
>>>
>>> According to ActOnEnumBody, EnumConstantDecl is null if a diagnostic has
>> previously been emitted for the constant.  Since the enum
>> is possibly ill-formed, skip checking it.
>>
>>>
>>> +
>>> +    if (ECD->getInitExpr())
>>> +      continue;
>>> +
>>> +    int64_t Val = GetInt64(ECD->getInitVal());
>>> +
>>>
>>>
>>> Looks good.
>>>
>>> +    if (Val == 0) {
>>> +      if (ZeroVector) continue;
>>> +      ZeroVector = new SameValueVector();
>>> +      ZeroVector->push_back(ECD);
>>> +      EnumVector.push_back(ZeroVector);
>>> +    } else if (Val == 1) {
>>> +      if (OneVector) continue;
>>> +      OneVector = new SameValueVector();
>>> +      OneVector->push_back(ECD);
>>> +      EnumVector.push_back(OneVector);
>>> +    } else {
>>> +      if (EnumMap.find(Val) != EnumMap.end())
>>> +        continue;
>>> +      SameValueVector *ValueVector = new SameValueVector();
>>> +      ValueVector->push_back(ECD);
>>> +      EnumVector.push_back(ValueVector);
>>> +      EnumMap.insert(std::make_pair(Val, ValueVector));
>>>
>>>
>>> The "find()" followed by the "insert()" is wasteful.  It results in two
>>> lookups to the hash table when we could have just used one.  More on that
>>> later.
>>>
>>> +    }
>>> +  }
>>>
>>>
>>> IMO, this looks like a lot of complexity just to handle the fact that 0
>>> and 1 are special values for the DenseMap.  I don't really see this as the
>>> right tradeoff; the code is more complicated with marginal impact on memory
>>> usage or performance.
>>>
>>> If you humor me for a bit, consider using something else for the key,
>>> e.g.:
>>>
>>> struct DupKey {
>>>   int64_t val;
>>>   bool isTombstoneOrEmptyKey;
>>> };
>>>
>>> The idea is if 'isTombStoneOrEmptyKey' is true, we can use val = 0 or
>>> val = 1 to represent empty keys or tombstone entries.  Otherwise, it's an
>>> int64_t, with the full range of values.  We can define a DenseMap trait to
>>> do the right thing.  Yes, this costs a tiny bit more in storage, but it
>>> allows the data structure to handle the complete set of values in your
>>> domain, instead of resorting to complicating the core algorithm.  What I
>>> see here now is the same code essentially duplicated twice, which makes it
>>> harder to read and more error prone.
>>>
>>> If we use DupKey as our key for the DenseMap, we can instead do
>>> something like this:
>>>
>>>    DeclOrVector &entry = EnumMap[Val];  // Use default construction of
>>> 'entry'.
>>>    // Is the first time we encountered this constant?
>>>    if (entry.isNull()) {
>>>      entry = ECD;
>>>      continue;
>>>    }
>>>    // Is this the second time we encountered this constant?  If so,
>>>    // push the previous decl encountered and the one just encountered
>>>    // to a vector of duplicates.
>>>    if (EnumConstantDecl *D = entry.dyn_cast<EnumConstantDecl*>()) {
>>>      ECDVector *Vec = new ECDVector();
>>>      Vec->push_back(D);
>>>      Vec->push_back(ECD);
>>>
>>>      // Update the entry to refer to the duplicates.
>>>      entry = Vec;
>>>
>>>      // Store the duplicates in a vector we can consult later for
>>>      // quick emission of diagnostics.
>>>      DupVector.push_back(Vec);
>>>
>>>      // On to the next constant.
>>>      continue;
>>>    }
>>>    // Is this the third (or greater) time we encountered the constant?
>>>  If so,
>>>    // continue to add it to the existing vector.
>>>    ECDVector *Vec = entry.get<ECDVector*>();
>>>    Vec->push_back(ECD);
>>>
>>>
>>> With this code, we only allocate memory (beyond the DenseMap) when we
>>> encounter a duplicate that would be worth reporting.  In the common case,
>>> this savings in malloc traffic should be noticeable.
>>>
>>> Notice also that I used:
>>>
>>>      DeclOrVector &entry = EnumMap[Val];  // Use default construction of
>>> 'entry'.
>>>
>>> This results in a single lookup in the hashtable.  Since we plan on
>>> adding a value for a key no matter what, by using this idiom we allow the
>>> DenseMap to default construct an entry if it doesn't exist.  This results
>>> in a single hashtable lookup, from which we can modify the value in place.
>>>  This is obviously faster than doing a hashtable lookup twice.
>>>
>>> +
>>> +  for (unsigned i = 0; i < NumElements; ++i) {
>>> +    EnumConstantDecl *ECD = cast<EnumConstantDecl>(Elements[i]);
>>> +    if (!ValidDuplicateEnum(ECD, Enum))
>>> +      continue;
>>> +
>>> +    int64_t Val = GetInt64(ECD->getInitVal());
>>> +
>>> +    if (Val == 0) {
>>> +      if (!ZeroVector || *ZeroVector->begin() == ECD)
>>> +        continue;
>>> +      ZeroVector->push_back(ECD);
>>> +    } else if (Val == 1) {
>>> +      if (!OneVector || *OneVector->begin() == ECD)
>>> +        continue;
>>> +      OneVector->push_back(ECD);
>>> +    } else {
>>> +      ValueToVectorMap::iterator I = EnumMap.find(Val);
>>> +      if (I == EnumMap.end())
>>> +        continue;
>>> +      SameValueVector *V = I->second;
>>> +      if (*V->begin() == ECD)
>>> +        continue;
>>> +      V->push_back(ECD);
>>> +    }
>>> +  }
>>>
>>>
>>> This second loop looks unnecessary.  I think we can do everything we
>>> need to count duplicates with one loop.  Of course the ValidDuplicateEnum()
>>> would need to be hoisted to the first loop.
>>>
>>> Using two traverses allows two things to happen.  One, the first element
>> in the ECDVector will not have an initializer and will work with the
>> warning.  Otherwise, the vector needs to be searched for a proper enum
>> constant to use.  Two, it prevents unneeded creation of ECDVectors.  If we
>> have enum A { A1 = 2, A2 = 2, A3 = 1, A4 = 1, A5}; vectors for values 1 and
>> 2 are created using a single pass while only a vector for 2 will be created
>> using a double pass.
>>
>>>
>>> +
>>> +  for (DoubleVector::iterator DoubleVectorIter = EnumVector.begin(),
>>> +                              DoubleVectorEnd = EnumVector.end();
>>> +       DoubleVectorIter != DoubleVectorEnd; ++DoubleVectorIter) {
>>> +    SameValueVector *V = *DoubleVectorIter;
>>> +    if (V->size() == 1)
>>> +      continue;
>>> +
>>> +    SameValueVector::iterator I = V->begin();
>>> +    S.Diag((*I)->getLocation(), diag::warn_duplicate_enum_values)
>>> +      << (*I)->getName() << (*I)->getInitVal().toString(10)
>>> +      << (*I)->getSourceRange();
>>> +    ++I;
>>> +    for (SameValueVector::iterator E = V->end(); I != E; ++I)
>>> +      S.Diag((*I)->getLocation(), diag::note_duplicate_element)
>>> +        << (*I)->getName() << (*I)->getInitVal().toString(10)
>>> +        << (*I)->getSourceRange();
>>> +    delete V;
>>> +  }
>>>
>>>
>>>
>>> This is more or less the same, essentially it becomes:
>>>
>>> for (DuplicateVector::iterator I = DupVector.begin(), E =
>>> DupVector.end(); I != E; ++I) {
>>>    ECDVector *Vec = *I;
>>>    // do the diagnostic logic ...
>>>    delete *I;
>>> }
>>>
>>> Note that with my suggestions the vector has size on order of the number
>>> of duplicate constants, not the number of total constants.  If there are no
>>> duplicates, no work is required (including free'ing memory).
>>>
>>> +}
>>> +
>>>  void Sema::ActOnEnumBody(SourceLocation EnumLoc, SourceLocation
>>> LBraceLoc,
>>>                           SourceLocation RBraceLoc, Decl *EnumDeclX,
>>>                           Decl **Elements, unsigned NumElements,
>>> @@ -10709,6 +10868,7 @@
>>>      DeclsInPrototypeScope.push_back(Enum);
>>>
>>>    CheckForUniqueEnumValues(*this, Elements, NumElements, Enum,
>>> EnumType);
>>> +  CheckForDuplicateEnumValues(*this, Elements, NumElements, Enum,
>>> EnumType);
>>>  }
>>>
>>>  Decl *Sema::ActOnFileScopeAsmDecl(Expr *expr,
>>>
>>>
>>> I know this may all be nit-picky, but I really think trying to reduce
>>> the malloc() traffic is worth looking at to get a real understanding of the
>>> performance improvement that can be found here.
>>>
>>> Thanks for forging ahead on this.
>>> _______________________________________________
>>> cfe-commits mailing list
>>> [email protected]
>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
>>>
>>>
>>>
>> <duplicate-enum-densemap2.patch>
>>
>>
>>
>
>

_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Re: [cfe-commits] [Patch] -Wduplicate-enum which fixes PR6343

Reply via email to