Re: [PATCH 00/29] [arm] Rewrite DImode arithmetic support

2019-10-25 Thread Christophe Lyon
On Fri, 25 Oct 2019 at 12:08, Richard Earnshaw (lists)
 wrote:
>
> On 24/10/2019 17:10, Richard Earnshaw (lists) wrote:
> > On 24/10/2019 11:16, Christophe Lyon wrote:
> >> On 23/10/2019 15:21, Richard Earnshaw (lists) wrote:
> >>> On 23/10/2019 09:28, Christophe Lyon wrote:
>  On 21/10/2019 14:24, Richard Earnshaw (lists) wrote:
> > On 21/10/2019 12:51, Christophe Lyon wrote:
> >> On 18/10/2019 21:48, Richard Earnshaw wrote:
> >>> Each patch should produce a working compiler (it did when it was
> >>> originally written), though since the patch set has been re-ordered
> >>> slightly there is a possibility that some of the intermediate steps
> >>> may have missing test updates that are only cleaned up later.
> >>> However, only the end of the series should be considered complete.
> >>> I've kept the patch as a series to permit easier regression hunting
> >>> should that prove necessary.
> >>
> >> Thanks for this information: my validation system was designed in
> >> such a way that it will run the GCC testsuite after each of your
> >> patches, so I'll keep in mind not to report regressions (I've
> >> noticed several already).
> >>
> >>
> >> I can perform a manual validation taking your 29 patches as a
> >> single one and compare the results with those of the revision
> >> preceding the one were you committed patch #1. Do you think it
> >> would be useful?
> >>
> >>
> >> Christophe
> >>
> >>
> >
> > I think if you can filter out any that are removed by later patches
> > and then report against the patch that caused the regression itself
> > then that would be the best.  But I realise that would be more work
> > for you, so a round-up against the combined set would be OK.
> >
> > BTW, I'm aware of an issue with the compiler now generating
> >
> >   reg, reg, shift 
> >
> > in Thumb2; no need to report that again.
> >
> > Thanks,
> > R.
> > .
> >
> 
> 
>  Hi Richard,
> 
>  The validation of the whole set shows 1 regression, which was also
>  reported by the validation of r277179 (early split most DImode
>  comparison operations)
> 
>  When GCC is configured as:
>  --target arm-none-eabi
>  --with-mode default
>  --with-cpu default
>  --with-fpu default
>  (that is, no --with-mode, --with-cpu, --with-fpu option)
>  I'm using binutils-2.28 and newlib-3.1.0
> 
>  I can see:
>  FAIL: g++.dg/opt/pr36449.C  -std=gnu++14 execution test
>  (whatever -std=gnu++XX option)
> >>>
> >>> That's strange.  The assembler code generated for that test is
> >>> unchanged from before the patch series, so I can't see how it can't
> >>> be a problem in the test itself.  What's more, I can't seem to
> >>> reproduce this myself.
> >>
> >> As you have noticed, I have created PR92207 to help understand this.
> >>
> >>>
> >>> Similarly, in my build the code for _Znwj, malloc, malloc_r and
> >>> free_r are also unchanged, while the malloc_[un]lock functions are
> >>> empty stubs (not surprising as we aren't multi-threaded).
> >>>
> >>> So the only thing that looks to have really changed are the linker
> >>> offsets (some of the library code has changed, but I don't think it's
> >>> really reached in practice, so shouldn't be relevant).
> >>>
> 
>  I'm executing the tests using qemu-4.1.0 -cpu arm926
>  The qemu traces shows that code enters main, then _Znwj (operator
>  new), then _malloc_r
>  The qemu traces end with:
> >>>
> >>> What do you mean by 'end with'?  What's the failure mode of the test?
> >>> A crash, or the test exiting with a failure code?
> >>>
> >> qemu complains with:
> >> qemu: uncaught target signal 11 (Segmentation fault) - core dumped
> >> Segmentation fault (core dumped)
> >>
> >> 'end with' because my automated validation builds do not keep the full
> >> execution traces (that would need too much disk space)
> >>
> >
> > As I've said in the PR, this looks like a bug in the qemu+newlib code.
> > We call sbrk() which says, OK, but then the page isn't mapped by qemu
> > into the process and it then faults.
> >
> > So I think these changes are off the hook, it's just bad luck that they
> > expose the issue at this point in time.
> >
> > R.
> >
>
> I've closed the PR as invalid, because this is a newlib bug that is
> fixed on trunk.  https://sourceware.org/ml/newlib/2019/msg00413.html
>
Thanks for the analysis.
It looks like I have to upgrade the newlib version I'm using for validations.

Christophe

> R.


Re: [PATCH 00/29] [arm] Rewrite DImode arithmetic support

2019-10-25 Thread Richard Earnshaw (lists)

On 24/10/2019 17:10, Richard Earnshaw (lists) wrote:

On 24/10/2019 11:16, Christophe Lyon wrote:

On 23/10/2019 15:21, Richard Earnshaw (lists) wrote:

On 23/10/2019 09:28, Christophe Lyon wrote:

On 21/10/2019 14:24, Richard Earnshaw (lists) wrote:

On 21/10/2019 12:51, Christophe Lyon wrote:

On 18/10/2019 21:48, Richard Earnshaw wrote:

Each patch should produce a working compiler (it did when it was
originally written), though since the patch set has been re-ordered
slightly there is a possibility that some of the intermediate steps
may have missing test updates that are only cleaned up later.
However, only the end of the series should be considered complete.
I've kept the patch as a series to permit easier regression hunting
should that prove necessary.


Thanks for this information: my validation system was designed in 
such a way that it will run the GCC testsuite after each of your 
patches, so I'll keep in mind not to report regressions (I've 
noticed several already).



I can perform a manual validation taking your 29 patches as a 
single one and compare the results with those of the revision 
preceding the one were you committed patch #1. Do you think it 
would be useful?



Christophe




I think if you can filter out any that are removed by later patches 
and then report against the patch that caused the regression itself 
then that would be the best.  But I realise that would be more work 
for you, so a round-up against the combined set would be OK.


BTW, I'm aware of an issue with the compiler now generating

  reg, reg, shift 

in Thumb2; no need to report that again.

Thanks,
R.
.




Hi Richard,

The validation of the whole set shows 1 regression, which was also 
reported by the validation of r277179 (early split most DImode 
comparison operations)


When GCC is configured as:
--target arm-none-eabi
--with-mode default
--with-cpu default
--with-fpu default
(that is, no --with-mode, --with-cpu, --with-fpu option)
I'm using binutils-2.28 and newlib-3.1.0

I can see:
FAIL: g++.dg/opt/pr36449.C  -std=gnu++14 execution test
(whatever -std=gnu++XX option)


That's strange.  The assembler code generated for that test is 
unchanged from before the patch series, so I can't see how it can't 
be a problem in the test itself.  What's more, I can't seem to 
reproduce this myself.


As you have noticed, I have created PR92207 to help understand this.



Similarly, in my build the code for _Znwj, malloc, malloc_r and 
free_r are also unchanged, while the malloc_[un]lock functions are 
empty stubs (not surprising as we aren't multi-threaded).


So the only thing that looks to have really changed are the linker 
offsets (some of the library code has changed, but I don't think it's 
really reached in practice, so shouldn't be relevant).




I'm executing the tests using qemu-4.1.0 -cpu arm926
The qemu traces shows that code enters main, then _Znwj (operator 
new), then _malloc_r

The qemu traces end with:


What do you mean by 'end with'?  What's the failure mode of the test? 
A crash, or the test exiting with a failure code?



qemu complains with:
qemu: uncaught target signal 11 (Segmentation fault) - core dumped
Segmentation fault (core dumped)

'end with' because my automated validation builds do not keep the full 
execution traces (that would need too much disk space)




As I've said in the PR, this looks like a bug in the qemu+newlib code. 
We call sbrk() which says, OK, but then the page isn't mapped by qemu 
into the process and it then faults.


So I think these changes are off the hook, it's just bad luck that they 
expose the issue at this point in time.


R.



I've closed the PR as invalid, because this is a newlib bug that is 
fixed on trunk.  https://sourceware.org/ml/newlib/2019/msg00413.html


R.


Re: [PATCH 00/29] [arm] Rewrite DImode arithmetic support

2019-10-24 Thread Richard Earnshaw (lists)

On 24/10/2019 11:16, Christophe Lyon wrote:

On 23/10/2019 15:21, Richard Earnshaw (lists) wrote:

On 23/10/2019 09:28, Christophe Lyon wrote:

On 21/10/2019 14:24, Richard Earnshaw (lists) wrote:

On 21/10/2019 12:51, Christophe Lyon wrote:

On 18/10/2019 21:48, Richard Earnshaw wrote:

Each patch should produce a working compiler (it did when it was
originally written), though since the patch set has been re-ordered
slightly there is a possibility that some of the intermediate steps
may have missing test updates that are only cleaned up later.
However, only the end of the series should be considered complete.
I've kept the patch as a series to permit easier regression hunting
should that prove necessary.


Thanks for this information: my validation system was designed in 
such a way that it will run the GCC testsuite after each of your 
patches, so I'll keep in mind not to report regressions (I've 
noticed several already).



I can perform a manual validation taking your 29 patches as a 
single one and compare the results with those of the revision 
preceding the one were you committed patch #1. Do you think it 
would be useful?



Christophe




I think if you can filter out any that are removed by later patches 
and then report against the patch that caused the regression itself 
then that would be the best.  But I realise that would be more work 
for you, so a round-up against the combined set would be OK.


BTW, I'm aware of an issue with the compiler now generating

  reg, reg, shift 

in Thumb2; no need to report that again.

Thanks,
R.
.




Hi Richard,

The validation of the whole set shows 1 regression, which was also 
reported by the validation of r277179 (early split most DImode 
comparison operations)


When GCC is configured as:
--target arm-none-eabi
--with-mode default
--with-cpu default
--with-fpu default
(that is, no --with-mode, --with-cpu, --with-fpu option)
I'm using binutils-2.28 and newlib-3.1.0

I can see:
FAIL: g++.dg/opt/pr36449.C  -std=gnu++14 execution test
(whatever -std=gnu++XX option)


That's strange.  The assembler code generated for that test is 
unchanged from before the patch series, so I can't see how it can't be 
a problem in the test itself.  What's more, I can't seem to reproduce 
this myself.


As you have noticed, I have created PR92207 to help understand this.



Similarly, in my build the code for _Znwj, malloc, malloc_r and free_r 
are also unchanged, while the malloc_[un]lock functions are empty 
stubs (not surprising as we aren't multi-threaded).


So the only thing that looks to have really changed are the linker 
offsets (some of the library code has changed, but I don't think it's 
really reached in practice, so shouldn't be relevant).




I'm executing the tests using qemu-4.1.0 -cpu arm926
The qemu traces shows that code enters main, then _Znwj (operator 
new), then _malloc_r

The qemu traces end with:


What do you mean by 'end with'?  What's the failure mode of the test?  
A crash, or the test exiting with a failure code?



qemu complains with:
qemu: uncaught target signal 11 (Segmentation fault) - core dumped
Segmentation fault (core dumped)

'end with' because my automated validation builds do not keep the full 
execution traces (that would need too much disk space)




As I've said in the PR, this looks like a bug in the qemu+newlib code. 
We call sbrk() which says, OK, but then the page isn't mapped by qemu 
into the process and it then faults.


So I think these changes are off the hook, it's just bad luck that they 
expose the issue at this point in time.


R.



Re: [PATCH 00/29] [arm] Rewrite DImode arithmetic support

2019-10-24 Thread Christophe Lyon

On 23/10/2019 15:21, Richard Earnshaw (lists) wrote:

On 23/10/2019 09:28, Christophe Lyon wrote:

On 21/10/2019 14:24, Richard Earnshaw (lists) wrote:

On 21/10/2019 12:51, Christophe Lyon wrote:

On 18/10/2019 21:48, Richard Earnshaw wrote:

Each patch should produce a working compiler (it did when it was
originally written), though since the patch set has been re-ordered
slightly there is a possibility that some of the intermediate steps
may have missing test updates that are only cleaned up later.
However, only the end of the series should be considered complete.
I've kept the patch as a series to permit easier regression hunting
should that prove necessary.


Thanks for this information: my validation system was designed in such a way 
that it will run the GCC testsuite after each of your patches, so I'll keep in 
mind not to report regressions (I've noticed several already).


I can perform a manual validation taking your 29 patches as a single one and 
compare the results with those of the revision preceding the one were you 
committed patch #1. Do you think it would be useful?


Christophe




I think if you can filter out any that are removed by later patches and then 
report against the patch that caused the regression itself then that would be 
the best.  But I realise that would be more work for you, so a round-up against 
the combined set would be OK.

BTW, I'm aware of an issue with the compiler now generating

  reg, reg, shift 

in Thumb2; no need to report that again.

Thanks,
R.
.




Hi Richard,

The validation of the whole set shows 1 regression, which was also reported by 
the validation of r277179 (early split most DImode comparison operations)

When GCC is configured as:
--target arm-none-eabi
--with-mode default
--with-cpu default
--with-fpu default
(that is, no --with-mode, --with-cpu, --with-fpu option)
I'm using binutils-2.28 and newlib-3.1.0

I can see:
FAIL: g++.dg/opt/pr36449.C  -std=gnu++14 execution test
(whatever -std=gnu++XX option)


That's strange.  The assembler code generated for that test is unchanged from 
before the patch series, so I can't see how it can't be a problem in the test 
itself.  What's more, I can't seem to reproduce this myself.


As you have noticed, I have created PR92207 to help understand this.



Similarly, in my build the code for _Znwj, malloc, malloc_r and free_r are also 
unchanged, while the malloc_[un]lock functions are empty stubs (not surprising 
as we aren't multi-threaded).

So the only thing that looks to have really changed are the linker offsets 
(some of the library code has changed, but I don't think it's really reached in 
practice, so shouldn't be relevant).



I'm executing the tests using qemu-4.1.0 -cpu arm926
The qemu traces shows that code enters main, then _Znwj (operator new), then 
_malloc_r
The qemu traces end with:


What do you mean by 'end with'?  What's the failure mode of the test?  A crash, 
or the test exiting with a failure code?


qemu complains with:
qemu: uncaught target signal 11 (Segmentation fault) - core dumped
Segmentation fault (core dumped)

'end with' because my automated validation builds do not keep the full 
execution traces (that would need too much disk space)


IN: _malloc_r^M
0x00019224:  e3a00ffe  mov  r0, #0x3f8^M
0x00019228:  e3a0c07f  mov  ip, #0x7f^M
0x0001922c:  e3a0e07e  mov  lr, #0x7e^M
0x00019230:  eafffe41  b    #0x18b3c^M
^M
R00=00049418 R01= R02=0554 R03=0004^M
R04= R05=0808 R06=00049418 R07=^M
R08= R09= R10=000492d8 R11=fffeb4b4^M
R12=0060 R13=fffeb460 R14=00018b14 R15=00019224^M
PSR=2010 --C- A usr32^M
^M
IN: _malloc_r^M
0x00018b3c:  e59f76f8  ldr  r7, [pc, #0x6f8]^M
0x00018b40:  e087  add  r0, r7, r0^M
0x00018b44:  e5903004  ldr  r3, [r0, #4]^M
0x00018b48:  e248  sub  r0, r0, #8^M
0x00018b4c:  e153  cmp  r0, r3^M
0x00018b50:  1a05  bne  #0x18b6c^M


But this block neither jumps to, nor falls through to 

^M
R00=03f8 R01= R02=0554 R03=0004^M
R04= R05=0808 R06=00049418 R07=^M
R08= R09= R10=000492d8 R11=fffeb4b4^M
R12=007f R13=fffeb460 R14=007e R15=00018b3c^M
PSR=2010 --C- A usr32^M
R00=00049c30 R01= R02=0554 R03=00049c30^M
R04= R05=0808 R06=00049418 R07=00049840^M
R08= R09= R10=000492d8 R11=fffeb4b4^M
R12=007f R13=fffeb460 R14=007e R15=00018b54^M
PSR=6010 -ZC- A usr32^M
^M
IN: _malloc_r^M


...here.  So there's some trace missing by the looks of it; or some other 
problem.


0x00019120:  e1a02a0b  lsl  r2, fp, #0x14^M
0x00019124:  e1a02a22  lsr  r2, r2, #0x14^M
0x00019128:  e352  cmp  r2, #0^M
0x0001912c:  1afffee7  bne  #0x18cd0^M


and the same here.

yes, qemu traces are 'incomplete'.




^M
R00=0004b000 R01=08002108 R02=00049e40 R03=0004b000^M
R04=0004a8e0 R05=0808 R06=00049418 

Re: [PATCH 00/29] [arm] Rewrite DImode arithmetic support

2019-10-23 Thread Richard Earnshaw (lists)

On 23/10/2019 09:28, Christophe Lyon wrote:

On 21/10/2019 14:24, Richard Earnshaw (lists) wrote:

On 21/10/2019 12:51, Christophe Lyon wrote:

On 18/10/2019 21:48, Richard Earnshaw wrote:

Each patch should produce a working compiler (it did when it was
originally written), though since the patch set has been re-ordered
slightly there is a possibility that some of the intermediate steps
may have missing test updates that are only cleaned up later.
However, only the end of the series should be considered complete.
I've kept the patch as a series to permit easier regression hunting
should that prove necessary.


Thanks for this information: my validation system was designed in 
such a way that it will run the GCC testsuite after each of your 
patches, so I'll keep in mind not to report regressions (I've noticed 
several already).



I can perform a manual validation taking your 29 patches as a single 
one and compare the results with those of the revision preceding the 
one were you committed patch #1. Do you think it would be useful?



Christophe




I think if you can filter out any that are removed by later patches 
and then report against the patch that caused the regression itself 
then that would be the best.  But I realise that would be more work 
for you, so a round-up against the combined set would be OK.


BTW, I'm aware of an issue with the compiler now generating

  reg, reg, shift 

in Thumb2; no need to report that again.

Thanks,
R.
.




Hi Richard,

The validation of the whole set shows 1 regression, which was also 
reported by the validation of r277179 (early split most DImode 
comparison operations)


When GCC is configured as:
--target arm-none-eabi
--with-mode default
--with-cpu default
--with-fpu default
(that is, no --with-mode, --with-cpu, --with-fpu option)
I'm using binutils-2.28 and newlib-3.1.0

I can see:
FAIL: g++.dg/opt/pr36449.C  -std=gnu++14 execution test
(whatever -std=gnu++XX option)


That's strange.  The assembler code generated for that test is unchanged 
from before the patch series, so I can't see how it can't be a problem 
in the test itself.  What's more, I can't seem to reproduce this myself.


Similarly, in my build the code for _Znwj, malloc, malloc_r and free_r 
are also unchanged, while the malloc_[un]lock functions are empty stubs 
(not surprising as we aren't multi-threaded).


So the only thing that looks to have really changed are the linker 
offsets (some of the library code has changed, but I don't think it's 
really reached in practice, so shouldn't be relevant).




I'm executing the tests using qemu-4.1.0 -cpu arm926
The qemu traces shows that code enters main, then _Znwj (operator new), 
then _malloc_r

The qemu traces end with:


What do you mean by 'end with'?  What's the failure mode of the test?  A 
crash, or the test exiting with a failure code?



IN: _malloc_r^M
0x00019224:  e3a00ffe  mov  r0, #0x3f8^M
0x00019228:  e3a0c07f  mov  ip, #0x7f^M
0x0001922c:  e3a0e07e  mov  lr, #0x7e^M
0x00019230:  eafffe41  b    #0x18b3c^M
^M
R00=00049418 R01= R02=0554 R03=0004^M
R04= R05=0808 R06=00049418 R07=^M
R08= R09= R10=000492d8 R11=fffeb4b4^M
R12=0060 R13=fffeb460 R14=00018b14 R15=00019224^M
PSR=2010 --C- A usr32^M
^M
IN: _malloc_r^M
0x00018b3c:  e59f76f8  ldr  r7, [pc, #0x6f8]^M
0x00018b40:  e087  add  r0, r7, r0^M
0x00018b44:  e5903004  ldr  r3, [r0, #4]^M
0x00018b48:  e248  sub  r0, r0, #8^M
0x00018b4c:  e153  cmp  r0, r3^M
0x00018b50:  1a05  bne  #0x18b6c^M


But this block neither jumps to, nor falls through to 

^M
R00=03f8 R01= R02=0554 R03=0004^M
R04= R05=0808 R06=00049418 R07=^M
R08= R09= R10=000492d8 R11=fffeb4b4^M
R12=007f R13=fffeb460 R14=007e R15=00018b3c^M
PSR=2010 --C- A usr32^M
R00=00049c30 R01= R02=0554 R03=00049c30^M
R04= R05=0808 R06=00049418 R07=00049840^M
R08= R09= R10=000492d8 R11=fffeb4b4^M
R12=007f R13=fffeb460 R14=007e R15=00018b54^M
PSR=6010 -ZC- A usr32^M
^M
IN: _malloc_r^M


...here.  So there's some trace missing by the looks of it; or some 
other problem.



0x00019120:  e1a02a0b  lsl  r2, fp, #0x14^M
0x00019124:  e1a02a22  lsr  r2, r2, #0x14^M
0x00019128:  e352  cmp  r2, #0^M
0x0001912c:  1afffee7  bne  #0x18cd0^M


and the same here.


^M
R00=0004b000 R01=08002108 R02=00049e40 R03=0004b000^M
R04=0004a8e0 R05=0808 R06=00049418 R07=00049840^M
R08=08001000 R09=0720 R10=00049e0c R11=0004b000^M
R12=007f R13=fffeb460 R14=00018ca0 R15=00019120^M
PSR=6010 -ZC- A usr32^M
^M
IN: _malloc_r^M
0x00019130:  e5974008  ldr  r4, [r7, #8]^M
0x00019134:  e0898008  add  r8, sb, r8^M
0x00019138:  e3888001  orr  r8, r8, #1^M
0x0001913c:  e5848004  str  r8, [r4, #4]^M
0x00019140:  ea14  b    

Re: [PATCH 00/29] [arm] Rewrite DImode arithmetic support

2019-10-23 Thread Christophe Lyon

On 21/10/2019 14:24, Richard Earnshaw (lists) wrote:

On 21/10/2019 12:51, Christophe Lyon wrote:

On 18/10/2019 21:48, Richard Earnshaw wrote:

Each patch should produce a working compiler (it did when it was
originally written), though since the patch set has been re-ordered
slightly there is a possibility that some of the intermediate steps
may have missing test updates that are only cleaned up later.
However, only the end of the series should be considered complete.
I've kept the patch as a series to permit easier regression hunting
should that prove necessary.


Thanks for this information: my validation system was designed in such a way 
that it will run the GCC testsuite after each of your patches, so I'll keep in 
mind not to report regressions (I've noticed several already).


I can perform a manual validation taking your 29 patches as a single one and 
compare the results with those of the revision preceding the one were you 
committed patch #1. Do you think it would be useful?


Christophe




I think if you can filter out any that are removed by later patches and then 
report against the patch that caused the regression itself then that would be 
the best.  But I realise that would be more work for you, so a round-up against 
the combined set would be OK.

BTW, I'm aware of an issue with the compiler now generating

  reg, reg, shift 

in Thumb2; no need to report that again.

Thanks,
R.
.




Hi Richard,

The validation of the whole set shows 1 regression, which was also reported by 
the validation of r277179 (early split most DImode comparison operations)

When GCC is configured as:
--target arm-none-eabi
--with-mode default
--with-cpu default
--with-fpu default
(that is, no --with-mode, --with-cpu, --with-fpu option)
I'm using binutils-2.28 and newlib-3.1.0

I can see:
FAIL: g++.dg/opt/pr36449.C  -std=gnu++14 execution test
(whatever -std=gnu++XX option)

I'm executing the tests using qemu-4.1.0 -cpu arm926
The qemu traces shows that code enters main, then _Znwj (operator new), then 
_malloc_r
The qemu traces end with:
IN: _malloc_r^M
0x00019224:  e3a00ffe  mov  r0, #0x3f8^M
0x00019228:  e3a0c07f  mov  ip, #0x7f^M
0x0001922c:  e3a0e07e  mov  lr, #0x7e^M
0x00019230:  eafffe41  b#0x18b3c^M
^M
R00=00049418 R01= R02=0554 R03=0004^M
R04= R05=0808 R06=00049418 R07=^M
R08= R09= R10=000492d8 R11=fffeb4b4^M
R12=0060 R13=fffeb460 R14=00018b14 R15=00019224^M
PSR=2010 --C- A usr32^M
^M
IN: _malloc_r^M
0x00018b3c:  e59f76f8  ldr  r7, [pc, #0x6f8]^M
0x00018b40:  e087  add  r0, r7, r0^M
0x00018b44:  e5903004  ldr  r3, [r0, #4]^M
0x00018b48:  e248  sub  r0, r0, #8^M
0x00018b4c:  e153  cmp  r0, r3^M
0x00018b50:  1a05  bne  #0x18b6c^M
^M
R00=03f8 R01= R02=0554 R03=0004^M
R04= R05=0808 R06=00049418 R07=^M
R08= R09= R10=000492d8 R11=fffeb4b4^M
R12=007f R13=fffeb460 R14=007e R15=00018b3c^M
PSR=2010 --C- A usr32^M
R00=00049c30 R01= R02=0554 R03=00049c30^M
R04= R05=0808 R06=00049418 R07=00049840^M
R08= R09= R10=000492d8 R11=fffeb4b4^M
R12=007f R13=fffeb460 R14=007e R15=00018b54^M
PSR=6010 -ZC- A usr32^M
^M
IN: _malloc_r^M
0x00019120:  e1a02a0b  lsl  r2, fp, #0x14^M
0x00019124:  e1a02a22  lsr  r2, r2, #0x14^M
0x00019128:  e352  cmp  r2, #0^M
0x0001912c:  1afffee7  bne  #0x18cd0^M
^M
R00=0004b000 R01=08002108 R02=00049e40 R03=0004b000^M
R04=0004a8e0 R05=0808 R06=00049418 R07=00049840^M
R08=08001000 R09=0720 R10=00049e0c R11=0004b000^M
R12=007f R13=fffeb460 R14=00018ca0 R15=00019120^M
PSR=6010 -ZC- A usr32^M
^M
IN: _malloc_r^M
0x00019130:  e5974008  ldr  r4, [r7, #8]^M
0x00019134:  e0898008  add  r8, sb, r8^M
0x00019138:  e3888001  orr  r8, r8, #1^M
0x0001913c:  e5848004  str  r8, [r4, #4]^M
0x00019140:  ea14  b#0x18d98^M
^M
R00=0004b000 R01=08002108 R02= R03=0004b000^M
R04=0004a8e0 R05=0808 R06=00049418 R07=00049840^M
R08=08001000 R09=0720 R10=00049e0c R11=0004b000^M
R12=007f R13=fffeb460 R14=00018ca0 R15=00019130^M
PSR=6010 -ZC- A usr32^M
R00=0004b000 R01=08002108 R02= R03=0004b000^M
R04=0004a8e0 R05=0808 R06=00049418 R07=00049840^M
R08=08001721 R09=0720 R10=00049e0c R11=0004b000^M
R12=007f R13=fffeb460 R14=00018ca0 R15=00018d98^M
PSR=6010 -ZC- A usr32^M

Christophe



Re: [PATCH 00/29] [arm] Rewrite DImode arithmetic support

2019-10-21 Thread Richard Earnshaw (lists)

On 21/10/2019 12:51, Christophe Lyon wrote:

On 18/10/2019 21:48, Richard Earnshaw wrote:

Each patch should produce a working compiler (it did when it was
originally written), though since the patch set has been re-ordered
slightly there is a possibility that some of the intermediate steps
may have missing test updates that are only cleaned up later.
However, only the end of the series should be considered complete.
I've kept the patch as a series to permit easier regression hunting
should that prove necessary.


Thanks for this information: my validation system was designed in such a 
way that it will run the GCC testsuite after each of your patches, so 
I'll keep in mind not to report regressions (I've noticed several already).



I can perform a manual validation taking your 29 patches as a single one 
and compare the results with those of the revision preceding the one 
were you committed patch #1. Do you think it would be useful?



Christophe




I think if you can filter out any that are removed by later patches and 
then report against the patch that caused the regression itself then 
that would be the best.  But I realise that would be more work for you, 
so a round-up against the combined set would be OK.


BTW, I'm aware of an issue with the compiler now generating

 reg, reg, shift 

in Thumb2; no need to report that again.

Thanks,
R.


Re: [PATCH 00/29] [arm] Rewrite DImode arithmetic support

2019-10-21 Thread Christophe Lyon

On 18/10/2019 21:48, Richard Earnshaw wrote:

Each patch should produce a working compiler (it did when it was
originally written), though since the patch set has been re-ordered
slightly there is a possibility that some of the intermediate steps
may have missing test updates that are only cleaned up later.
However, only the end of the series should be considered complete.
I've kept the patch as a series to permit easier regression hunting
should that prove necessary.


Thanks for this information: my validation system was designed in such a way 
that it will run the GCC testsuite after each of your patches, so I'll keep in 
mind not to report regressions (I've noticed several already).


I can perform a manual validation taking your 29 patches as a single one and 
compare the results with those of the revision preceding the one were you 
committed patch #1. Do you think it would be useful?


Christophe




Re: [PATCH 00/29] [arm] Rewrite DImode arithmetic support

2019-10-21 Thread Segher Boessenkool
On Sun, Oct 20, 2019 at 12:21:21PM +0100, Richard Earnshaw (lists) wrote:
> On 19/10/2019 17:31, Segher Boessenkool wrote:
> > I have a bunch of testcases from when I did something similar for PowerPC
> > that I wanted to test...  But I cannot get your series to apply.  Do you
> > have a git repo I can pull from?
> 
> Perhaps because it's already committed to trunk?

Oh probably.  Duh.  Thanks :-)

> > u64 addH(u64 a) { return a + 0x12345678ULL; }
> > u64 addH0(u64 a) { return a + 0x1234ULL; }

(If you change those to 0x340078ULL etc. it'll test as meant on arm
as well: to see if it uses immediates in the insn where it can.  It looks
like it'll work fine fwiw).

> We do pretty well on this.  Only addSHm1 needs three insns (except where
> the constant isn't valid for arm), and I think that's the minimum for
> this case anyway.  Several of the tests only need one insn.

Yeah, very nice :-)


Segher


Re: [PATCH 00/29] [arm] Rewrite DImode arithmetic support

2019-10-20 Thread Ramana Radhakrishnan
On Fri, Oct 18, 2019 at 8:49 PM Richard Earnshaw
 wrote:
>
>
> This series of patches rewrites all the DImode arithmetic patterns for
> the Arm backend when compiling for Arm or Thumb2 to split the
> operations during expand (the thumb1 code is unchanged and cannot
> benefit from early splitting as we are unable to expose the carry
> flag).
>
> This has a number of benefits:
>  - register allocation has more freedom to use independent
>registers for the upper and lower halves of the register
>  - we can make better use of combine for spotting insn merge
>opportunities without needing many additional patterns that are
>only used for DImode
>  - we eliminate a number of bugs in the machine description where
>the carry calculations were not correctly propagated by the
>split patterns (we mostly got away with this because the
>splitting previously happened only after most of the important
>optimization passes had been run).
>
> The patch series starts by paring back all the DImode arithmetic
> support to a very simple form without any splitting at all and then
> progressively re-implementing the patterns with early split
> operations.  This proved to be the only sane way of untangling the
> existing code due to a number of latent bugs which would have been
> exposed if a different approach had been taken.
>
> Each patch should produce a working compiler (it did when it was
> originally written), though since the patch set has been re-ordered
> slightly there is a possibility that some of the intermediate steps
> may have missing test updates that are only cleaned up later.
> However, only the end of the series should be considered complete.
> I've kept the patch as a series to permit easier regression hunting
> should that prove necessary.

Yay ! it's quite nice to see this go in.

Ramana


>
> R.
>
> Richard Earnshaw (29):
>   [arm] Rip out DImode addition and subtraction splits.
>   [arm] Perform early splitting of adddi3.
>   [arm] Early split zero- and sign-extension
>   [arm] Rewrite addsi3_carryin_shift_ in canonical form
>   [arm] fix constraints on addsi3_carryin_alt2
>   [arm] Early split subdi3
>   [arm] Remove redundant DImode subtract patterns
>   [arm] Introduce arm_carry_operation
>   [arm] Correctly cost addition with a carry-in
>   [arm] Correct cost calculations involving borrow for subtracts.
>   [arm] Reduce cost of insns that are simple reg-reg moves.
>   [arm] Implement negscc using SBC when appropriate.
>   [arm] Add alternative canonicalizations for subtract-with-carry +
> shift
>   [arm] Early split simple DImode equality comparisons
>   [arm] Improve handling of DImode comparisions against constants.
>   [arm] early split most DImode comparison operations.
>   [arm] Handle some constant comparisons using rsbs+rscs
>   [arm] Cleanup dead code - old support for DImode comparisons
>   [arm] Handle immediate values in uaddvsi4
>   [arm] Early expansion of uaddvdi4.
>   [arm] Improve code generation for addvsi4.
>   [arm] Allow the summation result of signed add-with-overflow to be
> discarded.
>   [arm] Early split addvdi4
>   [arm] Improve constant handling for usubvsi4.
>   [arm] Early expansion of usubvdi4.
>   [arm] Improve constant handling for subvsi4.
>   [arm] Early expansion of subvdi4
>   [arm] Improvements to negvsi4 and negvdi4.
>   [arm] Fix testsuite nit when compiling for thumb2
>
>  gcc/config/arm/arm-modes.def  |   19 +-
>  gcc/config/arm/arm-protos.h   |1 +
>  gcc/config/arm/arm.c  |  598 -
>  gcc/config/arm/arm.md | 2020 ++---
>  gcc/config/arm/iterators.md   |   15 +-
>  gcc/config/arm/predicates.md  |   29 +-
>  gcc/config/arm/thumb2.md  |8 +-
>  .../gcc.dg/builtin-arith-overflow-3.c |   41 +
>  gcc/testsuite/gcc.target/arm/negdi-3.c|4 +-
>  9 files changed, 1757 insertions(+), 978 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/builtin-arith-overflow-3.c
>


Re: [PATCH 00/29] [arm] Rewrite DImode arithmetic support

2019-10-20 Thread Richard Earnshaw (lists)
On 19/10/2019 17:31, Segher Boessenkool wrote:
> Hi Richard,
> 
> On Fri, Oct 18, 2019 at 08:48:31PM +0100, Richard Earnshaw wrote:
>>
>> This series of patches rewrites all the DImode arithmetic patterns for
>> the Arm backend when compiling for Arm or Thumb2 to split the
>> operations during expand (the thumb1 code is unchanged and cannot
>> benefit from early splitting as we are unable to expose the carry
>> flag).
> 
> Very nice :-)
> 
> I have a bunch of testcases from when I did something similar for PowerPC
> that I wanted to test...  But I cannot get your series to apply.  Do you
> have a git repo I can pull from?
> 

Perhaps because it's already committed to trunk?


> Here is one test case (it's a bit geared towards what our ISA can do):
> 
> ===
> typedef unsigned int u32;
> typedef unsigned long long u64;
> 
> u64 add(u64 a, u64 b) { return a + b; }
> u64 add1(u64 a) { return a + 1; }
> u64 add42(u64 a) { return a + 42; }
> u64 addm1(u64 a) { return a - 1; }
> u64 addff(u64 a) { return a + 0xULL; }
> u64 addH(u64 a) { return a + 0x12345678ULL; }
> u64 addH0(u64 a) { return a + 0x1234ULL; }
> u64 addS(u64 a, u32 b) { return a + b; }
> u64 addSH(u64 a, u32 b) { return a + ((u64)b << 32); }
> u64 addB1(u64 a) { return a + 0x1ULL; }
> u64 addB8(u64 a) { return a + 0x8ULL; }
> 
> u64 addSH42(u64 a, u32 b) { return a + ((u64)b << 32) + 42; }
> u64 addSHm1(u64 a, u32 b) { return a + ((u64)b << 32) - 1; }
> u64 addSHff(u64 a, u32 b) { return a + ((u64)b << 32) + 0xULL; }
> ===
> 
> rs6000 -m32 currently has non-optimal code for addm1, addSHm1; trunk arm
> has non-optimal code for addH0, addSH, addB1, addB8, addSH42, addSHm1, and
> addSHff if I understand well enough.  So I'd love to see what it does with
> your series applied :-)
> 
> 
> Segher
> 

We do pretty well on this.  Only addSHm1 needs three insns (except where
the constant isn't valid for arm), and I think that's the minimum for
this case anyway.  Several of the tests only need one insn.

R.
.arch armv8-a
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 1
.eabi_attribute 30, 2
.eabi_attribute 34, 1
.eabi_attribute 18, 4
.file   "lltest.c"
.text
.align  2
.global add
.syntax unified
.arm
.fpu softvfp
.type   add, %function
add:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
addsr0, r0, r2
adc r1, r1, r3
bx  lr
.size   add, .-add
.align  2
.global add1
.syntax unified
.arm
.fpu softvfp
.type   add1, %function
add1:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
addsr0, r0, #1
adc r1, r1, #0
bx  lr
.size   add1, .-add1
.align  2
.global add42
.syntax unified
.arm
.fpu softvfp
.type   add42, %function
add42:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
addsr0, r0, #42
adc r1, r1, #0
bx  lr
.size   add42, .-add42
.align  2
.global addm1
.syntax unified
.arm
.fpu softvfp
.type   addm1, %function
addm1:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
subsr0, r0, #1
sbc r1, r1, #0
bx  lr
.size   addm1, .-addm1
.align  2
.global addff
.syntax unified
.arm
.fpu softvfp
.type   addff, %function
addff:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
subsr0, r0, #1
adc r1, r1, #0
bx  lr
.size   addff, .-addff
.align  2
.global addH
.syntax unified
.arm
.fpu softvfp
.type   addH, %function
addH:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
movwr3, #22136
addsr0, r0, r3
movwr3, #4660
adc r1, r3, r1
bx  lr
.size   addH, .-addH
.align  2
.global addH0
.syntax unified
.arm
.fpu softvfp
.type   addH0, %function
addH0:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
add r1, r1, #4608
add r1, r1, 

Re: [PATCH 00/29] [arm] Rewrite DImode arithmetic support

2019-10-19 Thread Segher Boessenkool
Hi Richard,

On Fri, Oct 18, 2019 at 08:48:31PM +0100, Richard Earnshaw wrote:
> 
> This series of patches rewrites all the DImode arithmetic patterns for
> the Arm backend when compiling for Arm or Thumb2 to split the
> operations during expand (the thumb1 code is unchanged and cannot
> benefit from early splitting as we are unable to expose the carry
> flag).

Very nice :-)

I have a bunch of testcases from when I did something similar for PowerPC
that I wanted to test...  But I cannot get your series to apply.  Do you
have a git repo I can pull from?

Here is one test case (it's a bit geared towards what our ISA can do):

===
typedef unsigned int u32;
typedef unsigned long long u64;

u64 add(u64 a, u64 b) { return a + b; }
u64 add1(u64 a) { return a + 1; }
u64 add42(u64 a) { return a + 42; }
u64 addm1(u64 a) { return a - 1; }
u64 addff(u64 a) { return a + 0xULL; }
u64 addH(u64 a) { return a + 0x12345678ULL; }
u64 addH0(u64 a) { return a + 0x1234ULL; }
u64 addS(u64 a, u32 b) { return a + b; }
u64 addSH(u64 a, u32 b) { return a + ((u64)b << 32); }
u64 addB1(u64 a) { return a + 0x1ULL; }
u64 addB8(u64 a) { return a + 0x8ULL; }

u64 addSH42(u64 a, u32 b) { return a + ((u64)b << 32) + 42; }
u64 addSHm1(u64 a, u32 b) { return a + ((u64)b << 32) - 1; }
u64 addSHff(u64 a, u32 b) { return a + ((u64)b << 32) + 0xULL; }
===

rs6000 -m32 currently has non-optimal code for addm1, addSHm1; trunk arm
has non-optimal code for addH0, addSH, addB1, addB8, addSH42, addSHm1, and
addSHff if I understand well enough.  So I'd love to see what it does with
your series applied :-)


Segher