Re: std::vector move assign patch

Jonathan Wakely Tue, 25 Apr 2017 09:54:27 -0700

On 25/04/17 18:12 +0200, Marc Glisse wrote:

On Tue, 25 Apr 2017, Jonathan Wakely wrote:
On 25/04/17 17:23 +0200, Marc Glisse wrote:
On Tue, 25 Apr 2017, Jonathan Wakely wrote:
On 24/04/17 22:10 +0200, Marc Glisse wrote:
It seems that this patch had 2 consequences that may or maynot have been planned. Consider this example (from PR64601)
#include <vector>
typedef std::vector<int> V;
void f(V&v,V&w){ V(std::move(w)).swap(v); }
void g(V&v,V&w){ v=std::move(w); }
1) We generate shorter code for f than for g, probably sincethe fix for PR59738. g ends up zeroing v, copying w to v, andfinally zeroing w, and for weird reasons (and because we swapthe members one by one) the standard prevents us from assumingthat v and w do not overlap in weird ways so we cannotoptimize as much as one might expect.
f has an additional precondition (that the allocators of the vectors
being swapped must propagate on swap or be equal) and so the swap code
doesn't have to worry about non-equal allocators.

g has to be able to cope with the case where the allocator doesn't
propagate and isn't equal, and so is more complicated.

However, the propagation trait is known at compile-time, and for the
common case so is the equality condition, so it's unfortunate if that
can't be simplified (I'm sure you've analysed it carefully already
though!)
The code isn't horrible. With f, we get:

      movq    (%rsi), %r8
      movq    8(%rsi), %rcx
      movq    $0, (%rsi)
      movq    $0, 8(%rsi)
      movq    16(%rsi), %rdx
      movq    $0, 16(%rsi)
      movq    (%rdi), %rax
      movq    %rcx, 8(%rdi)
      movq    %r8, (%rdi)
      movq    %rdx, 16(%rdi)
      testq   %rax, %rax
which seems quite optimal: read each pointer from w, write them tov, write 0s in w, that's 9 memory operations, +1 to read thepointer from w and possibly call delete on it.
Of course I should have mentioned that writing 0s is done between theread and the other write, which differentiates it from g.
With g:

      movq    $0, 8(%rdi)
      movq    (%rdi), %rax
      movq    $0, 16(%rdi)
      movq    $0, (%rdi)
      movq    (%rsi), %rdx
      movq    %rdx, (%rdi)
      movq    8(%rsi), %rcx
      movq    $0, (%rsi)
      movq    8(%rdi), %rdx
      movq    %rcx, 8(%rdi)
      movq    16(%rsi), %rcx
      movq    %rdx, 8(%rsi)
      movq    16(%rdi), %rdx
      movq    %rcx, 16(%rdi)
      movq    %rdx, 16(%rsi)
      testq   %rax, %rax
That's only 5 more memory operations. If I tweak vector swappingto avoid calling swap on each member (which drops type-basedaliasing information, that was the topic of PR64601)
I didn't really understand the discussion in the PR. I find that's
true of most TBAA discussions.

std::swap(T& x, T& y) is hard to optimise because we don't know that
the dynamic type of the thing at &x is the same type as T?
std::swap<int*> only knows that it is dealing with pointers tointegers, so _M_start from one vector might be in the same location as_M_finish from some other vector...
When I access __x._M_start directly, I am accessing some part of a_Vector_impl, and 2 _Vector_impl can only be the same or disjoint,they cannot partially overlap.


I see. I'm surprised that after the std::swap calls get inlined, so
that the references can be "seen through", the pessimisation is still
present.

      void _M_swap_data(_Vector_impl& __x) _GLIBCXX_NOEXCEPT
      {
        pointer tmp;
#define MARC(x,y) tmp=x; x=y; y=tmp
        MARC(_M_start, __x._M_start);
        MARC(_M_finish, __x._M_finish);
        MARC(_M_end_of_storage, __x._M_end_of_storage);
      }

this gets down to 13, which is kind of sensible
* 0 the elements of v -> 3 ops
* read the elements of w -> 3 ops
* write them to v -> 3 ops
* 0 the elements of w -> 3 ops
(+1 to get the pointer that we might call delete on)

The first step of zeroing the elements of v is redundant
* if v and w don't alias, we are going to overwrite those 0s instep 3 without ever reading them
* if v and w are the same, we are going to write those 0s in step 4 anyway

but that's hard for the optimizers to notice.
I didn't try hard to find a nice C++ way to get an equivalent of gthat generates the optimal number of operations, but it would be alittle ugly to write in operator=
this->_M_impl._M_finish = x._M_impl._M_finish; x._M_impl._M_finish = 0;
same for _M_end_of_storage and _M_start, and remembering to usethe original this->_M_impl._M_start for delete.
I'm not opposed to writing it out by hand. Operations on std::vector
should be as fast as possible, and move-assignment should be cheap.
I am not sure it would make a noticable difference. I might do it atsome point in the future...
By the way, do we have a policy on writing if(p)delete p; vs directlydelete p; which does nothing for p==0? There are cases where the extrashortcut is a nice optimization, others where it is redundant work. At-Os, the middle-end could optimize if(p)free(p) to free(p) in thefuture.


No official policy. I always consider the if (p) to be redundant
noise, but that's based on readability and correctness, not
performance.

Re: std::vector move assign patch

Reply via email to