On Tue, 2019-12-10 at 16:00 -0700, Martin Sebor wrote:
> On 12/10/19 3:07 PM, David Malcolm wrote:
> > On Tue, 2019-12-03 at 15:41 -0700, Martin Sebor wrote:
> > > After allocating a new chunk of memory hash_table::expand() copy-
> > > assigns elements from the current array over the newly allocated
> > > elements without constructing them.
> > > 
> > > Similarly, after destroying existing elements, hash_table::
> > > empty_slow() assigns a new value to them.  This bug was
> > > introduced in r249234 when trying to deal with -Wclass-memaccess
> > > instances when the warning was first added.
> > > 
> > > Neither of these is a problem for PODs but both cause trouble
> > > when
> > > the hash_table contains elements of a type with a user-defined
> > > copy
> > > assignment operator.  There are at least two such instances in
> > > GCC
> > > already and a third is under review.
> > > 
> > > The attached patch avoids this by using placement new to
> > > construct
> > > new elements in uninitialized storage and restoring the original
> > > call to memset in hash_table::empty_slow(), analogously to what
> > > was done in r272893 for PR90923,
> > > 
> > > Longer term, to make these templates safely and efficiently
> > > usable
> > > with non-POD types with user-defined copy ctors and copy
> > > assignment
> > > operators I think these classes will probably need to be enhanced
> > > to make use of "assign" and "move" traits functions to
> > > efficiently
> > > assign and move objects.
> > > 
> > > Martin
> > > diff --git a/gcc/hash-table.h b/gcc/hash-table.h
> > > index ba5d64fb7a3..26bac624a08 100644
> > > --- a/gcc/hash-table.h
> > > +++ b/gcc/hash-table.h
> > > @@ -818,8 +818,7 @@ hash_table<Descriptor, Lazy,
> > > Allocator>::expand ()
> > >         if (!is_empty (x) && !is_deleted (x))
> > >           {
> > >             value_type *q = find_empty_slot_for_expand
> > > (Descriptor::hash (x));
> > > -
> > > -          *q = x;
> > > +   new ((void*) q) value_type (x);
> > >           }
> > >   
> > >         p++;
> > > @@ -869,14 +868,8 @@ hash_table<Descriptor, Lazy,
> > > Allocator>::empty_slow ()
> > >         m_size_prime_index = nindex;
> > >       }
> > >     else
> > > -    {
> > > -#ifndef BROKEN_VALUE_INITIALIZATION
> > > -      for ( ; size; ++entries, --size)
> > > - *entries = value_type ();
> > > -#else
> > > -      memset (entries, 0, size * sizeof (value_type));
> > > -#endif
> > > -    }
> > > +    memset ((void *) entries, 0, size * sizeof (value_type));
> > > +
> > >     m_n_deleted = 0;
> > >     m_n_elements = 0;
> > >   }
> > 
> > On attempting to rebase my analyzer branch I found that this patch
> > uncovered a bug in it, but also possibly a bug in hash-table.h.
> > 
> > In the analyzer branch I have a hash_map with a key where the
> > "empty"
> > value isn't all-zero-bits.
> 
> There's a test case for it in comment #3 in 92762.

Thanks.  I've adapted that into a selftest in this patch.

> > Specifically, in gcc/analyzer/program-state.h: sm_state_map has a
> > hash_map <svalue_id, entry_t> map_t, where svalue_id, the key type,
> > has
> > hash traits:
> > 
> > template <>
> > inline void
> > pod_hash_traits<svalue_id>::mark_empty (value_type &v)
> > {
> >    v = svalue_id::null ();
> > }
> > 
> > which is a -1 int (all ones).
> > 
> > memset to zero bits populates the "empty" slots with a key with a
> > non-
> > empty value for this key type, effectively corrupting the data
> > structure (luckily a selftest caught this).
> > 
> > Looking at the above patch, it looks like I was unwittingly relying
> > on
> > two things:
> > (a) #ifndef BROKEN_VALUE_INITIALIZATION, and
> > (b) that the default ctor for my key type was the "empty" value.
> > 
> > However, shouldn't empty_slow be calling the Descriptor::mark_empty
> > function?  Doesn't this memset code assume that the "empty" value
> > of
> > the hash_table entry is all-zeroes (and thus imposing the same
> > assumption on all hash_maps' key and value types?) - which isn't
> > the
> > case for my code.  I don't remember seeing that assumption
> > documented.

Correcting myself, I *think* this would impose the assumption that all
hash_maps' keys types "empty" value is all-zeroes; I don't think that
holds for the value types though.

> > 
> > Or am I misreading this?
> 
> IIRC, I had tried the below but it caused problems during bootstrap
> that I didn't spend enough time investigating.
> 
>    for (size_t i = size - 1; i < size; i--)
>      if (!is_empty (entries[i]) && !is_deleted (entries[i]))
>        {
>          Descriptor::remove (entries[i]);
>          mark_empty (entries[i]);
>        }
> 
> (I think it also caused compilation error in one of the IPA passes
> because of a missing mark_empty function in its traits class; maybe
> ipa_bit_ggc_hash_traits in ipa-prop.c).
> 
> To be honest, I'm not sure I understand why the memset is even there.
> It seems like a leak to me (I can reproduce leaks with a user-defined
> type).  I was going to get back to it at some point.
> 
> Martin

Thanks for all the info.

Considering the change you tried to hash_table::empty_slow, AIUI, an
entry in a hash_table can be in one of three states:
(a) "empty"
(b) "deleted"
(c) a properly constructed object

If I'm reading the change you tried above correctly, it's only marking
as empty the entries that are in state (c); it's leaving "deleted"
entries untouched.

Later in empty_slow, control flow splits into resize vs non-resize
cases.  In the resize case, alloc_entries is called, and alloc_entries
has this loop:
  for (size_t i = 0; i < n; i++)
    mark_empty (nentries[i]);
thus ensuring that every entry is in the "empty" state.
However, for the non-resize case, we now currently reach the code
touched in r279139 (the above patch):
   memset (entries, 0, size * sizeof (value_type));
Hence if a hash_map is emptied without being resized, we end up with
either entries still being marked as deleted despite m_n_deleted == 0
(the fix you tried above), or all the entries being zero-filled and
thus effectively things if zero isn't an "empty" state (current trunk).

The following patch updates hash_table::empty_slow so that the
non-resize case uses the same loop as the resize case, thus marking all
entries as empty, without assuming that in must be all-zeroes.

The patch is actually on top of:
  https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01514.html
which merely adds more selftests to hash-map-tests.c and which Jeff
already approved (the hash_map in that new test used an EMPTY value of
-1, but the various ops happened not to call empty_slow).

Hopefully I'm understanding things correctly.

Is it OK for a hash_map key to have a "empty" value that isn't
all-zeroes (and thus the same for a hash_table entry)?

Is the following patch OK for trunk?

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu (albeit
in conjuction with the rest of the analyzer patch kit, which it fixes;
I'll retest cleanly before committing if approved).

Thanks
Dave

gcc/ChangeLog:
        * hash-map-tests.c (selftest::test_nonzero_empty_key): New selftest.
        (selftest::hash_map_tests_c_tests): Call it.
        * hash-table.h
        (hash_table<Descriptor, Lazy, Allocator>::empty_slow): When not
        resizing the entries buffer, replace the memset to zero with a
        loop that calls mark_empty on all elements.
---
 gcc/hash-map-tests.c | 22 ++++++++++++++++++++++
 gcc/hash-table.h     |  5 ++++-
 2 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/gcc/hash-map-tests.c b/gcc/hash-map-tests.c
index 9a13a80442c8..9a92425df5db 100644
--- a/gcc/hash-map-tests.c
+++ b/gcc/hash-map-tests.c
@@ -278,6 +278,27 @@ test_map_of_type_with_ctor_and_dtor ()
   }
 }
 
+/* Test calling empty on a hash_map that has a key type with non-zero
+   "empty" value.  */
+
+static void
+test_nonzero_empty_key ()
+{
+  typedef int_hash<int, INT_MIN, INT_MAX> IntHash;
+  hash_map<int, int, simple_hashmap_traits<IntHash, int> > x;
+
+  for (int i = 1; i != 32; ++i)
+    x.put (i, i);
+
+  ASSERT_EQ (x.get (0), NULL);
+  ASSERT_EQ (*x.get (1), 1);
+
+  x.empty ();
+
+  ASSERT_EQ (x.get (0), NULL);
+  ASSERT_EQ (x.get (1), NULL);
+}
+
 /* Run all of the selftests within this file.  */
 
 void
@@ -286,6 +307,7 @@ hash_map_tests_c_tests ()
   test_map_of_strings_to_int ();
   test_map_of_int_to_strings ();
   test_map_of_type_with_ctor_and_dtor ();
+  test_nonzero_empty_key ();
 }
 
 } // namespace selftest
diff --git a/gcc/hash-table.h b/gcc/hash-table.h
index 26bac624a082..126bd6d22ad5 100644
--- a/gcc/hash-table.h
+++ b/gcc/hash-table.h
@@ -868,7 +868,10 @@ hash_table<Descriptor, Lazy, Allocator>::empty_slow ()
       m_size_prime_index = nindex;
     }
   else
-    memset ((void *) entries, 0, size * sizeof (value_type));
+    {
+      for (size_t i = 0; i < size; i++)
+       mark_empty (entries[i]);
+    }
 
   m_n_deleted = 0;
   m_n_elements = 0;
-- 
2.21.0

Reply via email to