> Am 08.09.2022 um 00:05 schrieb Lewis Hyatt via Gcc-patches 
> <gcc-patches@gcc.gnu.org>:
> 
> The function rebuild_location_adhoc_htab() was meant to reconstruct the
> adhoc location hash map after restoring a line_maps instance from a
> PCH. However, the function has never performed as intended because it
> missed the last step of adding the data into the newly reconstructed hash
> map. This patch fixes that.
> 
> It does not seem possible to construct a test case such that the current
> incorrect behavior is observable as a compiler issue. It would be
> observable, if it were possible for a precompiled header to contain an
> adhoc location with a non-zero custom data pointer. But currently, such
> data pointers are used only by the middle end to track inlining
> information, and this happens later, too late to show up in a PCH.
> 
> I also noted that location_adhoc_data_update, which updates the hash map
> pointers in a different scenario, was relying on undefined pointer
> arithmetic behavior. I'm not aware of this having caused any issue in
> practice, but in this patch I have also changed it to use defined pointer
> operations instead.


Ok.

Thanks,
Richard 

> libcpp/ChangeLog:
> 
>    * line-map.cc (location_adhoc_data_update): Remove reliance on
>    undefined behavior.
>    (get_combined_adhoc_loc): Likewise.
>    (rebuild_location_adhoc_htab): Fix issue where the htab was not
>    properly updated.
> ---
> 
> Notes:
>    Hello-
> 
>    While working on something unrelated in line-map.cc, I noticed that the
>    function rebuild_location_adhoc_htab(), whose job is to reconstruct the
>    adhoc data hash table after a line_maps instance is reconstructed from PCH,
>    doesn't actually rebuild the hash table at all:
> 
>    void
>    rebuild_location_adhoc_htab (line_maps *set)
>    {
>      unsigned i;
>      set->location_adhoc_data_map.htab =
>          htab_create (100, location_adhoc_data_hash, location_adhoc_data_eq, 
> NULL);
>      for (i = 0; i < set->location_adhoc_data_map.curr_loc; i++)
>        htab_find_slot (set->location_adhoc_data_map.htab,
>                        set->location_adhoc_data_map.data + i, INSERT);
>     ^^^^^^^^^^^^^^
>    }
> 
>    In order to have the intended effect, it needs to set the return value of
>    htab_find_slot to be set->location_adhoc_data_map.data + i, otherwise it
>    doesn't effectively do anything except make the hash table think it has
>    curr_loc elements set, when in fact it has 0. Subsequent calls to
>    htab_traverse, for instance, will do nothing, and any lookups will also 
> fail.
> 
>    I tried for some time to construct a test case that would demonstrate an
>    observable consequence of this issue, but I don't think it's possible... 
> The
>    nontrivial uses of this hash map are in the middle end (e.g. to produce the
>    trace of where a given expression was inlined from), and all this happens
>    after PCH was read in, and doesn't require any state to be read from the
>    PCH. It would become apparent in the future, however, if the ability to
>    attach arbitrary data to an adhoc location were used in other ways, perhaps
>    somewhere in libcpp; in that hypothetical case, the data would be lost when
>    reading back in the PCH.
> 
>    There is another kinda related function, location_adhoc_data_update, which
>    updates all the pointers in the hash map whenever they are invalidated. It
>    seems to me, that this function invokes undefined behavior, since it adds 
> an
>    arbitrary offset to the pointers, which do not necessarily point into the
>    same array after they were realloced. I don't think it's led to any 
> problems
>    in practice but in this patch I also changed that to use well-defined
>    operations. Note sure how people may feel about that, since it does require
>    on the surface 2x as many operations with this change, but I can't see how
>    the current approach is guaranteed to be valid on all architectures?
> 
>    Bootstrap + regtest looks good on x86-64 Linux. Thanks a lot to whoever may
>    have time to take a look at it.
> 
>    -Lewis
> 
> libcpp/line-map.cc | 41 +++++++++++++++++++++++++++--------------
> 1 file changed, 27 insertions(+), 14 deletions(-)
> 
> diff --git a/libcpp/line-map.cc b/libcpp/line-map.cc
> index 62077c3857c..391f1d4bbc1 100644
> --- a/libcpp/line-map.cc
> +++ b/libcpp/line-map.cc
> @@ -85,27 +85,38 @@ location_adhoc_data_eq (const void *l1, const void *l2)
>      && lb1->data == lb2->data);
> }
> 
> -/* Update the hashtable when location_adhoc_data is reallocated.  */
> +/* Update the hashtable when location_adhoc_data_map::data is reallocated.
> +   The param is an array of two pointers, the previous value of the data
> +   pointer, and then the new value.  The pointers stored in the hash map
> +   are then rebased to be relative to the new data pointer instead of the
> +   old one.  */
> 
> static int
> -location_adhoc_data_update (void **slot, void *data)
> +location_adhoc_data_update (void **slot_v, void *param_v)
> {
> -  *((char **) slot)
> -    = (char *) ((uintptr_t) *((char **) slot) + *((ptrdiff_t *) data));
> +  const auto slot = reinterpret_cast<location_adhoc_data **> (slot_v);
> +  const auto param = static_cast<location_adhoc_data **> (param_v);
> +  *slot = (*slot - param[0]) + param[1];
>   return 1;
> }
> 
> -/* Rebuild the hash table from the location adhoc data.  */
> +/* The adhoc data hash table is not part of the GGC infrastructure, so it was
> +   not initialized when SET was reconstructed from PCH; take care of that by
> +   rebuilding it from scratch.  */
> 
> void
> rebuild_location_adhoc_htab (line_maps *set)
> {
> -  unsigned i;
>   set->location_adhoc_data_map.htab =
>       htab_create (100, location_adhoc_data_hash, location_adhoc_data_eq, 
> NULL);
> -  for (i = 0; i < set->location_adhoc_data_map.curr_loc; i++)
> -    htab_find_slot (set->location_adhoc_data_map.htab,
> -            set->location_adhoc_data_map.data + i, INSERT);
> +  for (auto p = set->location_adhoc_data_map.data,
> +        end = p + set->location_adhoc_data_map.curr_loc;
> +      p != end; ++p)
> +    {
> +      const auto slot = reinterpret_cast<location_adhoc_data **>
> +    (htab_find_slot (set->location_adhoc_data_map.htab, p, INSERT));
> +      *slot = p;
> +    }
> }
> 
> /* Helper function for get_combined_adhoc_loc.
> @@ -211,8 +222,7 @@ get_combined_adhoc_loc (line_maps *set,
>       if (set->location_adhoc_data_map.curr_loc >=
>      set->location_adhoc_data_map.allocated)
>    {
> -      char *orig_data = (char *) set->location_adhoc_data_map.data;
> -      ptrdiff_t offset;
> +      const auto orig_data = set->location_adhoc_data_map.data;
>      /* Cast away extern "C" from the type of xrealloc.  */
>      line_map_realloc reallocator = (set->reallocator
>                      ? set->reallocator
> @@ -226,10 +236,13 @@ get_combined_adhoc_loc (line_maps *set,
>          reallocator (set->location_adhoc_data_map.data,
>               set->location_adhoc_data_map.allocated
>               * sizeof (struct location_adhoc_data));
> -      offset = (char *) (set->location_adhoc_data_map.data) - orig_data;
>      if (set->location_adhoc_data_map.allocated > 128)
> -        htab_traverse (set->location_adhoc_data_map.htab,
> -               location_adhoc_data_update, &offset);
> +        {
> +          location_adhoc_data *param[2]
> +        = {orig_data, set->location_adhoc_data_map.data};
> +          htab_traverse (set->location_adhoc_data_map.htab,
> +                 location_adhoc_data_update, param);
> +        }
>    }
>       *slot = set->location_adhoc_data_map.data
>          + set->location_adhoc_data_map.curr_loc;

Reply via email to