[Bug target/115065] AVR clz is not always fast as can be

2024-05-18 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115065

Georg-Johann Lay  changed:

   What|Removed |Added

   Priority|P3  |P5
   Severity|normal  |minor
   Target Milestone|--- |14.2
 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from Georg-Johann Lay  ---
Fixed in v14.2+

[Bug target/115065] AVR clz is not always fast as can be

2024-05-18 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115065

--- Comment #1 from Georg-Johann Lay  ---
IIUC, this is just about the timing of a branch, which in the general != 0 is
currently taken (takes 2 ticks), but it's better to only take it in the
non-common (= 0) case? So that the common case falls through and thus takes 1
cycle less.

[Bug rtl-optimization/90706] [10/11/12/13 Regression] Useless code generated for stack / register operations on AVR

2024-05-18 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90706

Georg-Johann Lay  changed:

   What|Removed |Added

   Target Milestone|10.5|12.3

[Bug target/115084] Missed optimization in division for AVR target, not using __*divmodpsi4

2024-05-17 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115084

--- Comment #3 from Georg-Johann Lay  ---
I don't see what the avr backend can do about it; it's rather a middle-end
thing.  And the middle-end would have to know that there is a 24-bit integral
mode in the backend and that its division is preferred over 32-bit division...

[Bug target/114981] [avr] Improve powi implementation

2024-05-10 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114981

Georg-Johann Lay  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
   Target Milestone|--- |14.2
 Resolution|--- |FIXED

--- Comment #7 from Georg-Johann Lay  ---
The float variant is tweaked in v14.2+

The double variant is supported since v13.3+ and v14.2+.

[Bug target/114975] [AVR] Using popcounthi2 for 8-bit values despite popcountqi2

2024-05-09 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114975

Georg-Johann Lay  changed:

   What|Removed |Added

   Target Milestone|15.0|14.2

--- Comment #6 from Georg-Johann Lay  ---
Fixed in v14.2+

[Bug target/114975] [AVR] Using popcounthi2 for 8-bit values despite popcountqi2

2024-05-08 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114975

Georg-Johann Lay  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED
  Component|middle-end  |target
   Target Milestone|--- |15.0

--- Comment #3 from Georg-Johann Lay  ---
Fixed in v15.

[Bug target/114981] [avr] Improve powi implementation

2024-05-08 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114981

--- Comment #1 from Georg-Johann Lay  ---
(In reply to Georg-Johann Lay from comment #0)
> ... due to PR11093 ...

PR110093

[Bug target/114981] New: [avr] Improve powi implementation

2024-05-08 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114981

Bug ID: 114981
   Summary: [avr] Improve powi implementation
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gjl at gcc dot gnu.org
  Target Milestone: ---

powif() and powi() can be improved in functionality

* __powisf2 is in open coded C, which bloats due to PR11093 or PR114243 or
similar register allocation flaws.

* __powidf2 is not implemented.

[Bug target/114835] AVR popcountqi2 is not fast as can be

2024-05-07 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114835

--- Comment #3 from Georg-Johann Lay  ---
(In reply to Wolfgang Hospital from comment #0)
> When establishing the "popcount" of an uint8_t, I've seen GCC to widen the
> value to "half int" and use __popcountqi2 twice.

This is a different issue, please f'up PR114975.

[Bug target/114835] AVR popcountqi2 is not fast as can be

2024-05-07 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114835

Georg-Johann Lay  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #2 from Georg-Johann Lay  ---
Fixed in v15.

[Bug middle-end/114975] New: [AVR] Using popcounthi2 for 8-bit values despit popcountqi2

2024-05-07 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114975

Bug ID: 114975
   Summary: [AVR] Using popcounthi2 for 8-bit values despit
popcountqi2
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gjl at gcc dot gnu.org
  Target Milestone: ---

C test case

typedef __UINT8_TYPE__ uint8_t;

uint8_t use_pop (uint8_t x)
{
return __builtin_popcount (x);
}

compiles with

$ avr-gcc pop.c -Os -S

to:

use_pop:
ldi r25,0
rcall __popcounthi2
ret

.ident  "GCC: (GNU) 14.0.1 20240421 (experimental)"

despite libgcc providing __popcountqi2.

I am not even sure which component is supposed to treat this.  The tree
optimizers do only __builtin_popcount, and as there's nothing like popcount_u8
they cannot do anything about it.  So blaming the middle-end for now.

[Bug ipa/92606] [11/12/13 Regression][avr] invalid merge of symbols in progmem and data sections

2024-05-07 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92606

--- Comment #36 from Georg-Johann Lay  ---
Installed a work-around for v14.2+, v13.3+ and v12.4+

The work-around can be reverted once a proper fix like PR92932 is available.

[Bug target/114794] [avr] Speed up udivmodqi4

2024-04-21 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114794

Georg-Johann Lay  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED
   Target Milestone|--- |13.3

--- Comment #3 from Georg-Johann Lay  ---
Fixed in v13.3+

[Bug target/114794] [avr] Speed up udivmodqi4

2024-04-21 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114794

Georg-Johann Lay  changed:

   What|Removed |Added

   Keywords||missed-optimization
   Priority|P3  |P5
 Target||avr

[Bug target/114794] New: [avr] Speed up udivmodqi4

2024-04-21 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114794

Bug ID: 114794
   Summary: [avr] Speed up udivmodqi4
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gjl at gcc dot gnu.org
  Target Milestone: ---

udivmodqi4 is slower than it could be.

[Bug tree-optimization/114779] __builtin_constant_p does not work in inline functions

2024-04-19 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114779

Georg-Johann Lay  changed:

   What|Removed |Added

   Keywords||documentation

--- Comment #10 from Georg-Johann Lay  ---
so I set keyword "documentation"

[Bug tree-optimization/114779] __builtin_constant_p does not work in inline functions

2024-04-19 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114779

--- Comment #9 from Georg-Johann Lay  ---
When this PR won't be fixed, then maybe at least the documentation could
clarify how to port macros to inline functions.

[Bug tree-optimization/114779] __builtin_constant_p does not work in inline functions

2024-04-19 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114779

--- Comment #6 from Georg-Johann Lay  ---
Recognizing more __builtin_constant_p situations is a good thing, IMO.

It would allow to transition from macros to inline functions in such
situations, for example in inline asm that has extra opcodes for const
addresses.

[Bug tree-optimization/114779] __builtin_constant_p does not work in inline functions

2024-04-19 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114779

--- Comment #4 from Georg-Johann Lay  ---
As far as I understand, & SFR has no side effects.

But when it is used as argument to an (inline) function, then it does have side
effects?

[Bug tree-optimization/114779] __builtin_constant_p does not work in inline functions

2024-04-19 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114779

--- Comment #2 from Georg-Johann Lay  ---
Notice that when  is used directly in __builtin_constant_p without an
inline function, then the code works as expected:

int main (void)
{
if (__builtin_constant_p (& SFR))
__asm (".warning \"psfr = %0 is constant\"" :: "n" (& SFR));
return 0;
}

$ gcc bar.c -c -O2
Assembler messages:
Warning: psfr = $256 is constant

[Bug tree-optimization/114779] New: __builtin_constant_p does not work in inline functions

2024-04-19 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114779

Bug ID: 114779
   Summary: __builtin_constant_p does not work in inline functions
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gjl at gcc dot gnu.org
  Target Milestone: ---

Take the following C test case with a special function register (SFR)
definition at a constant address:

#define SFR (*(volatile int*) 0x100)

static __inline__ __attribute__((__always_inline__))
void test_bcp (int volatile *psfr)
{
if (! __builtin_constant_p (psfr))
__asm (".error \"psfr is not constant\"");
}

int main (void)
{
test_bcp (& SFR);
return 0;
}

Then compile with:

$ gcc bar.c -c -O2
Assembler messages:
Error: psfr is not constant

[Bug target/114752] AVR: internal compiler error. Unknown mode: const_double:DF

2024-04-17 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114752

Georg-Johann Lay  changed:

   What|Removed |Added

   Target Milestone|--- |13.3
 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #3 from Georg-Johann Lay  ---
Fixed in v13.3+

[Bug target/114752] AVR: internal compiler error. Unknown mode: const_double:DF

2024-04-17 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114752

Georg-Johann Lay  changed:

   What|Removed |Added

   Priority|P3  |P5
   Keywords||ice-on-valid-code
 Target||avr

[Bug target/114752] New: AVR: internal compiler error. Unknown mode: const_double:DF

2024-04-17 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114752

Bug ID: 114752
   Summary: AVR: internal compiler error. Unknown mode:
const_double:DF
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: minor
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gjl at gcc dot gnu.org
  Target Milestone: ---

Inline asm does not accept 64-bit float constants:

void func (void)
{
__asm ("; %0" :: "EF" (1.0L));
}


foo.c: In function 'func':
foo.c: error: internal compiler error.  Unknown mode:
  | }
  | ^
(const_double:DF 1.0e+0 [0x0.8p+1])
during RTL pass: final
foo.c:4:1: internal compiler error: in avr_print_operand, at
config/avr/avr.cc:3937

[Bug target/114252] Introducing bswapsi reduces code performance

2024-03-07 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114252

--- Comment #14 from Georg-Johann Lay  ---
The code in the example is not a perfect bswap, it needs additional shuffling
of bytes.  The tree passes must know that bswap is not a perfect fit.  There
must be *some* criterion that depends on the permutation, and when a bswap is
closer to the bswapped-permutation that a non-bswapped permutation is to the
original one.

[Bug target/114252] Introducing bswapsi reduces code performance

2024-03-07 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114252

--- Comment #12 from Georg-Johann Lay  ---
(In reply to Richard Biener from comment #10)
> I think the target controls the "libcall" ABI that's used for calls to
> libgcc,

You have a pointer how to do it or an example? IIRC I looked into it quite a
while ago, and it didn't allow to specify/adjust call_used_regs[] etc.

> I think the target should implement an inline bswap, possibly via a
> define_insn_and_split or define_split so the byte ops are only exposed
> at a desired point;  important points being lower_subreg (split-wide-types)
> and register allocation - possibly lower_subreg should itself know
> how to handle bswap (though the degenerate AVR case is quite special).

That would result in SUBREGs all over the place.  As Vladimir pointed out in 

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110093#c5

DFA doesn't handle subregs properly, and register alloc then uses extra
reloads, bloating the code (not only in PR110093 but also 114243.  Unlikely any
pass will untangle the mess of four (set (subreg:QI (SI)) (subreg:QI (SI)))



> Yeah.  Or comparing to open-coding the bswap without going through the call.
> I don't have a AVR libgcc around, but libgcc2.s has
> 
> #ifdef L_bswapsi2
> SItype
> __bswapsi2 (SItype u)
> {
>   return u) & 0xff00u) >> 24)
>   | (((u) & 0x00ffu) >>  8)
>   | (((u) & 0xff00u) <<  8)
>   | (((u) & 0x00ffu) << 24));
> }
> #endif 

The libgcc side is not a problem at all, libgcc/config/avr/lib1funcs.S has:

;; swap two registers with different register number
.macro bswap a, b
eor \a, \b
eor \b, \a
eor \a, \b
.endm

#if defined (L_bswapsi2)
;; swap bytes
;; r25:r22 = bswap32 (r25:r22)
DEFUN __bswapsi2
bswap r22, r25
bswap r23, r24
ret
ENDF __bswapsi2
#endif /* defined (L_bswapsi2) */

#if defined (L_bswapdi2)
;; swap bytes
;; r25:r18 = bswap64 (r25:r18)
DEFUN __bswapdi2
bswap r18, r25
bswap r19, r24
bswap r20, r23
bswap r21, r22
ret
ENDF __bswapdi2
#endif /* defined (L_bswapdi2) */


There's currently no handcrafted bswap16 though.

[Bug target/114252] Introducing bswapsi reduces code performance

2024-03-07 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114252

--- Comment #9 from Georg-Johann Lay  ---
...and I don't see why a register allocator would or should fix flaws from tree
optimizers.

[Bug target/114252] Introducing bswapsi reduces code performance

2024-03-07 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114252

--- Comment #8 from Georg-Johann Lay  ---
(In reply to Richard Biener from comment #7)
> Note I do understand what you are saying, just the middle-end in detecting
> and using __builtin_bswap32 does what it does everywhere else - it checks
> whether the target implements the operation.
> 
> The middle-end doesn't try to actually compare costs (it has no idea of the
> bswapsi costs),

But even when the bswapsi insn costs nothing, the v14 code has these additional
6 movqi insns 32...37 compared to v13 code.  In order to have the same
performance like v13 code, a bswapsi would have to cost negative 6 insns.  And
an optimizer that assumes negative costs is not reasonable, in particular
because the recognition of bswap opportunities serves optimization -- or is
supposed to serve it as far as I understand.

> and it most definitely doesn't see how AVR is special in
> having only QImode registers and thus the created SImode load (which the
> target supports!) will end up as four registers.

Even when the bswap insn would cost nothing the code is worse.

> The only thing that maybe would make sense with AVR exposing bswapsi is
> users calling __builtin_bswap but since it always expands as a libcall
> even that makes no sense.

It makes perfect sense when C/C++ code uses __builtin_bswap32:

* With current bswapsi insn, the code does a call that performs SI:22 =
bswap(SI:22) with NO additionall register pressure.

* Without bswap insn, the code does a real ABI call that performs SI:22 =
bswap(SI:22) PLUS IT CLOBBERS r18, r19, r20, r21, r26, r27, r30 and r31; which
are the most powerful GPRs.

> So my preferred fix would be to remove bswapsi from avr.md?

Is there a way that the backend can fold a call to an insn that performs better
that a call? Like in TARGET_FOLD_BUILTIN?  As far as I know, the backend can
only fold target builtins, but not common builtins?  Tree fold cannot fold to
an insn obviously, but it could fold to inline asm, no?

Or can the target change an optabs entry so it expands to an insn that's more
profitable that a respective call? (like avr.md's bswap insn with transparent
call is more profitable than a real call).

The avr backend does this for many other stuff, too:

divmod, SI and PSI multiplications, parity, popcount, clz, ffs, 

> Does it benefit from recognizing bswap done with shifts on an int?

I don't fully understand that question. You mean to write code that shifts
bytes around like in
uint32_t res = 0;
res |= ((uint32_t) buf[0]) << 24;
res |= ((uint32_t) buf[1]) << 16;
res |= (uint32_t) buf[2] << 8;
res |= buf[3];
return res;
is better than a bswapsi call?

[Bug target/114252] Introducing bswapsi reduces code performance

2024-03-06 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114252

--- Comment #5 from Georg-Johann Lay  ---
(In reply to Richard Biener from comment #4)
> So bswap on a value is just register shuffling, right?

The point is that there is no need for bswap in the first place, just have a
look at the code that v13 generates.  It's 4 QI loads and that's it, no
shuffling required at all.

But v14 dropped that, and the bswapsi (presumably due to previous flawed tree
optmizations) is introduced by some tree pass.

There's nothing the backend can do about it.  So would you explain why you
think it's a "target" issue?

Maybe the PR title I used is confusing and does not hit the point?

[Bug target/114252] Introducing bswapsi reduces code performance

2024-03-06 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114252

--- Comment #3 from Georg-Johann Lay  ---
(In reply to Richard Biener from comment #1)
> but somehow we end up doing a libcall?

It's not a libcall in the GCC sense, for the compiler it's just an ordinary
insn.  The backend then prints this as a transparent call to libgcc.

Purpose is that many functions have a small, known footprint as they are
implemented in assembly. An ordinary call would clobber all callee-used regs,
so using a transparent call gives better code than a real call.  Notice this is
the nsn:

(define_insn "*bswapsi2.libgcc"
  [(set (reg:SI 22)
(bswap:SI (reg:SI 22)))
   (clobber (reg:CC REG_CC))]
  "reload_completed"
  "%~call __bswapsi2"
  [(set_attr "type" "xcall")])

However, for the purpose of this PR, no bswap is needed in the 1st place; just
have a look at the v13 code. It just loads the bytes as they belong into the
target value; while v14 loads all 32 bits in one chunk and then starts fiddling
and moving around the constituent bytes.

[Bug tree-optimization/114252] New: Introducing bswapsi reduces code performance

2024-03-06 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114252

Bug ID: 114252
   Summary: Introducing bswapsi reduces code performance
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gjl at gcc dot gnu.org
  Target Milestone: ---

Created attachment 57628
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57628=edit
GNU-C test case

typedef __UINT8_TYPE__ uint8_t;
typedef __UINT32_TYPE__ uint32_t;

typedef uint8_t __attribute__((vector_size(4))) v4u8_t;

uint32_t func1 (const uint8_t *buf) {
v4u8_t v4 = { buf[1], buf[0], buf[3], buf[2] };

return (uint32_t) v4;
}

Compile the code with

$ avr-gcc code.c -S -Os -dp

with v13 the result is:


func1:
mov r30,r24  ;  37  [c=4 l=1]  movqi_insn/0
mov r31,r25  ;  38  [c=4 l=1]  movqi_insn/0
ldd r22,Z+1  ;  39  [c=4 l=1]  movqi_insn/3
ld r23,Z ;  40  [c=4 l=1]  movqi_insn/3
ldd r24,Z+3  ;  41  [c=4 l=1]  movqi_insn/3
ldd r25,Z+2  ;  42  [c=4 l=1]  movqi_insn/3
/* epilogue start */
ret  ;  45  [c=0 l=1]  return

which is good code: insn 37, 38 move the address to pointer register Z, and
then follow 4 loads, one for each byte.

When compiled with v14 however:

func1:
mov r30,r24  ;  23  [c=4 l=2]  *movhi/0
mov r31,r25
ld r22,Z ;  24  [c=16 l=4]  *movsi/2
ldd r23,Z+1
ldd r24,Z+2
ldd r25,Z+3
rcall __bswapsi2 ;  25  [c=16 l=1]  *bswapsi2.libgcc
mov r31,r23  ;  32  [c=4 l=1]  movqi_insn/0
mov r23,r25  ;  33  [c=4 l=1]  movqi_insn/0
mov r25,r31  ;  34  [c=4 l=1]  movqi_insn/0
mov r31,r22  ;  35  [c=4 l=1]  movqi_insn/0
mov r22,r24  ;  36  [c=4 l=1]  movqi_insn/0
mov r24,r31  ;  37  [c=4 l=1]  movqi_insn/0
/* epilogue start */
ret  ;  40  [c=0 l=1]  return


Target: avr
Configured with: ../../source/gcc-master/configure --target=avr --disable-nls
--with-dwarf2 --with-gnu-as --with-gnu-ld --disable-shared
--enable-languages=c,c++
Thread model: single
Supported LTO compression algorithms: zlib
gcc version 14.0.1 20240303 (experimental) (GCC)

[Bug target/81473] [avr] build fails due to INT8_MIN and friends.

2024-03-05 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81473

--- Comment #4 from Georg-Johann Lay  ---
This was fixed long ago.

[Bug rtl-optimization/114243] [avr] -fsplit-wide-types bloats code by more than 50%

2024-03-05 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114243

--- Comment #1 from Georg-Johann Lay  ---
May be related to PR110093.  As Vladimir noted in

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110093#c5

the problem is that data flow analysis cannot cope with the subregs generated
from lower-subregs, and register alloc chokes at it.

[Bug rtl-optimization/90706] [10/11/12/13 Regression] Useless code generated for stack / register operations on AVR

2024-03-05 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90706

--- Comment #24 from Georg-Johann Lay  ---
(In reply to Georg-Johann Lay from comment #23)
> As it appears, this bug is not fixed completely.  For the -mmcu=avrtiny
> architecture, there is still bloat for even the smallest test cases like:

Different story, f'up to PR113927.

[Bug rtl-optimization/114243] New: -fsplit-wide-types bloats code by more than 50%

2024-03-05 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114243

Bug ID: 114243
   Summary: -fsplit-wide-types bloats code by more than 50%
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gjl at gcc dot gnu.org
  Target Milestone: ---

Created attachment 57616
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57616=edit
pi-sigma.c: C99 test case

Compile the attached test case with:

$ avr-gcc pi-sigma.c -c -Os -mmcu=atmega8 -fstack-usage && avr-size pi-sigma.o

Then the code sizes are for respective versions of the compiler:

avr-gcc-v8:   624
avr-gcc-v14: 1008

which is an increase of code size of more than 60% !

The stack usage also increases by a lot. According to pi-sigma.su:

avr-gcc-v8:
---
pi-sigma.c:80:7:sigma   30  static
pi-sigma.c:86:7:pi_n14  static

avr-gcc-v14:

pi-sigma.c:80:7:sigma   86  static
pi-sigma.c:86:7:pi_n36  static

That is for the 1st function the stack use almost triples!

With -fno-split-wide-types the performace of v14 code is similar to v8.

Target: avr
Configured with: ../../source/gcc-master/configure --target=avr --disable-nls
--with-dwarf2 --with-gnu-as --with-gnu-ld --disable-shared
--enable-languages=c,c++ 
Thread model: single
Supported LTO compression algorithms: zlib
gcc version 14.0.1 20240303 (experimental) (GCC)

[Bug rtl-optimization/114208] RTL DSE deletes a store that is not dead

2024-03-04 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114208

--- Comment #5 from Georg-Johann Lay  ---
(In reply to Richard Biener from comment #4)
> Did it ever work?
No.  I allowed -mfuse-add=3 to reproduce this PR because there seems to be a
problem with DSE, and for the case that someone is going to fix it before it
bites an important target.  The mfuse-add optimization tries to avoid the
broken parts of DSE and works around it; documented are only -mfuse-add=0...2 
It was added Feb 2024 as PR114100.

>  I suppose 'st Y+,r20 is' post-inc so maybe DSE mishandles this somehow.
That post-inc is only generated after .dse2: .split2 splits some move insns:
These cores don't have reg+offset addressing, so the backend must pretend to
support it.  Then .split2 generates pointer-adjust + mem-access +
undo-pointer-adjust.  The address adjustments are plain additions of the
address register (frame pointer in this case) and have according
REG_CFA_ADJUST_CFA notes.  Then .dse2 removes some non-dead stores.  The 'st
Y+,r20' you mentioned is only generated by .avr-fuse-add which runs after
.dse2.

I'd guess that GCC is not ready for targets with such tight addressing modes?
(without reg+offset addressing; stack-pointer cannot be used either, the only
SP accesses are PUSH and POP).

ad "needs-bisection": -mfuse-add is a new target optimization added as PR114100
in Feb 2024, so bi-secting won't work because -mfuse-add is not recognized
prior to that date.

[Bug other/114191] Flags "Warning" and "Target" don't mix well in target.opt files

2024-03-04 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114191

--- Comment #3 from Georg-Johann Lay  ---
(In reply to Richard Biener from comment #1)
> How did you specify 'Target'?

Like:

Wmisspelled-isr
Target Warning C C++ Var(avr_warn_misspelled_isr) Init(1)
Warn if the ISR is misspelled, ...

[Bug other/114191] Flags "Warning" and "Target" don't mix well in target.opt files

2024-03-04 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114191

--- Comment #2 from Georg-Johann Lay  ---
(In reply to Richard Biener from comment #1)
> Wmisspelled-isr
> Target C C++ Var(avr_warn_misspelled_isr) Init(1)
> Warn if the ISR is misspelled, ...
> 
> should eventually work?

With that, the warnings appear as they should, but the option is not
recognized:

$ avr-gcc signal.c -S -Wmisspelled-isr
error: unrecognized command-line option '-Wmisspelled-isr'

$ avr-gcc signal.c -S -Wno-misspelled-isr
error: unrecognized command-line option '-Wno-misspelled-isr'

$ avr-gcc signal.c -S -Werror=misspelled-isr
error: '-Werror=misspelled-isr': '-Wmisspelled-isr' is not an option that
controls warnings

[Bug rtl-optimization/114208] RTL DSE deletes a store that is not dead

2024-03-02 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114208

--- Comment #3 from Georg-Johann Lay  ---
(In reply to Andrew Pinski from comment #1)
> I wonder if this is related to r14-6674-g4759383245ac97 .

Seems unrelated: When I reverse-apply r14-6674 then the issue does not go away.

[Bug rtl-optimization/114208] RTL DSE deletes a store that is not dead

2024-03-02 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114208

--- Comment #2 from Georg-Johann Lay  ---
(In reply to Andrew Pinski from comment #1)
> I wonder if this is related to r14-6674-g4759383245ac97 .

Not unlikely. PR112525 tries to eliminate dead stores for arguments that are
passed.  It seems like that change misses some required conditions like
frame-pointer / arg-pointer adjustments.

[Bug rtl-optimization/114208] New: DSE deletes a store that is not dead

2024-03-02 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114208

Bug ID: 114208
   Summary: DSE deletes a store that is not dead
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gjl at gcc dot gnu.org
  Target Milestone: ---

Created attachment 57594
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57594=edit
Reduced C test case

$ avr-gcc -mmcu=attiny40 bug-dse.c -S -Os -dp -mfuse-add=3 -fdse

the following C test case:

struct S { char a, b; };

__attribute__((__noinline__,__noclone__))
void test (const struct S *s)
{
if (s->a != 3 || s->b != 4)
__builtin_abort();
}

int main (void)
{
struct S s = { 3, 4 };
test ();

  return 0;
}

Then with DSE off (-fno-dse), main has a store of 3 into s.a:

main:
...
ldi r20,lo8(3)   ;  22  [c=4 l=1]  movqi_insn/1
ld __tmp_reg__,Y+;  24  [c=4 l=1]  *addhi3/3
st Y+,r20;  48  [c=4 l=1]  movqi_insn/2
ldi r20,lo8(4)   ;  27  [c=4 l=1]  movqi_insn/1
st Y,r20 ;  30  [c=4 l=1]  movqi_insn/2
...

but with DSE on, pass .dse2 removes the first store (insn 48, and in the wake
also insn 22) that sets s.a to 3:

main:
...
ldi r20,lo8(4)   ;  27  [c=4 l=1]  movqi_insn/1
subi r28,-2  ;  29  [c=4 l=2]  *addhi3/3
sbci r29,-1
st Y,r20 ;  30  [c=4 l=1]  movqi_insn/2
...

Configured with: ../../source/gcc-master/configure --target=avr --disable-nls
--with-dwarf2 --with-gnu-as --with-gnu-ld --disable-shared
--enable-languages=c,c++
Thread model: single
Supported LTO compression algorithms: zlib
gcc version 14.0.1 20240302 (experimental) (GCC)

[Bug other/114191] New: Flags "Warning" and "Target" don't mix well in target.opt files

2024-03-01 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114191

Bug ID: 114191
   Summary: Flags "Warning" and "Target" don't mix well in
target.opt files
   Product: gcc
   Version: 13.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: other
  Assignee: unassigned at gcc dot gnu.org
      Reporter: gjl at gcc dot gnu.org
  Target Milestone: ---

In an .opt file, a backend can define target-specific diagnostic options, for
example gcc/config/avr/avr.opt has:

Wmisspelled-isr
Warning C C++ Var(avr_warn_misspelled_isr) Init(1)
Warn if the ISR is misspelled, ...

This is a "Target" option however (so it should be listed with --help=target,
which it currently is not). However, specifying the "Target" flag in avr.opt
makes the option no more recognizable:

$ avr-gcc main.c -c -Wall -Wmisspelled-isr
cc1: error: unrecognized command-line option '-Wmisspelled-isr'

I can reproduce this for target avr, but it likely affects all other targets as
well.

Set the component to "other". As it appears, there is no bugzilla component for
such internal problems.

[Bug target/114100] [avr] Inefficient indirect addressing on Reduced Tiny

2024-03-01 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114100

Georg-Johann Lay  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Georg-Johann Lay  ---
Improved in v14

[Bug target/114132] [avr] Code sets up a frame pointer without need

2024-02-29 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114132

Georg-Johann Lay  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #2 from Georg-Johann Lay  ---
Fixed in v14.

[Bug target/114132] [avr] Code sets up a frame pointer without need

2024-02-27 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114132

Georg-Johann Lay  changed:

   What|Removed |Added

   Target Milestone|--- |14.0
   Priority|P3  |P4
 Target||avr

[Bug target/114132] New: [avr] Code sets up a frame pointer without need

2024-02-27 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114132

Bug ID: 114132
   Summary: [avr] Code sets up a frame pointer without need
   Product: gcc
   Version: 13.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gjl at gcc dot gnu.org
  Target Milestone: ---

$ avr-gcc -S -Os -mmcu=attiny40 

of 

void funcab_c (long x, char c) {
}

sets up a frame-pointer without need.

Arguments x and c occupy all of the argument registers R25..R20, so that no arg
registers are left.  Then there is this implementation of
TARGET_FRAME_POINTER_REQUIRED in avr.cc:

static bool
avr_frame_pointer_required_p (void)
{
  return (cfun->calls_alloca
  || cfun->calls_setjmp
  || cfun->has_nonlocal_label
  || crtl->args.info.nregs == 0
  || get_frame_size () > 0);
}

Problem is that crtl->args.info.nregs == 0 does not discriminate between need
for arg pointer and no need for arg pointer (but all arg regs are used up, like
in the example).

[Bug middle-end/114111] New: [avr] Expensive code instead of conditional branch.

2024-02-26 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114111

Bug ID: 114111
   Summary: [avr] Expensive code instead of conditional branch.
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gjl at gcc dot gnu.org
  Target Milestone: ---

Created attachment 57541
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57541=edit
addcc.c: C test case

Compile the code with avr-gcc -S -Os -dp:

int add_ge0 (int x, char c) {
return x + (c >= 0);
}

int add_eq0 (int x, char c) {
return x + (c == 0);
}

int add_le0 (int x, char c) {
return x + (c <= 0);
}

int add_ge1 (int x, char c) {
return x + (c >= 1);
}

int add_ltm3 (int x, char c) {
return x + (c < -3);
}

int add_bit6 (int x, char c) {
return x + !!(c & (1 << 6));
}

int add_nbit6 (int x, char c) {
return x + !(c & (1 << 6));
}

All these could be performed by a test and the addition of x in an if-block. 
But what the compiler does is to extend the 8-bit value c to 16 bit, then
complement it, then shift the MSB to the LSB:

add_ge0:
mov __tmp_reg__,r22  ;  23  [c=12 l=3]  *extendqihi2/0
lsl r0  
sbc r23,r23
com r22  ;  24  [c=8 l=2]  *one_cmplhi2
com r23
bst r23,7;  31  [c=16 l=4]  *lshrhi3_const/3
clr r22
clr r23
bld r22,0
add r24,r22  ;  26  [c=8 l=2]  *addhi3/0
adc r25,r23
ret  ;  29  [c=0 l=1]  return

Even when it does a conditional to set the addend, it should rather have the
addition in the if-block (and moving x to R18 adds even more bloat):

add_eq0:
mov r18,r24  ;  44  [c=4 l=1]  movqi_insn/0
mov r19,r25  ;  45  [c=4 l=1]  movqi_insn/0
ldi r24,lo8(1)   ;  46  [c=4 l=2]  *movhi/4
ldi r25,0   
cp r22, __zero_reg__ ;  47  [c=4 l=1]  cmpqi3/0
breq .L3 ;  48  [c=4 l=1]  branch
ldi r24,0;  43  [c=4 l=2]  *movhi/1
ldi r25,0   
.L3:
add r24,r18  ;  42  [c=8 l=2]  *addhi3/0
adc r25,r19
ret  ;  51  [c=0 l=1]  return

...
.ident  "GCC: (GNU) 14.0.1 20240212 (experimental)"

With avr-gcc 3.4.6 from around 2006, the generated code is as follows:

add_ge0:
sbrs r22,7   ;  38  *sbrx_branch[length = 2]
adiw r24,1   ;  15  *addhi3/2   [length = 1]
.L2:
ret  ;  37  return  [length = 1]

add_eq0:
tst r22  ;  13  tstqi   [length = 1]
brne .L4 ;  14  branch  [length = 1]
adiw r24,1   ;  15  *addhi3/2   [length = 1]
.L4:
ret  ;  35  return  [length = 1]

etc.  So at some point in time GCC lost all that smartness.

Appears to be around emit_stor_flag and friends; as far as I can see it doesn't
even try to work out costs.

[Bug target/114100] [avr] Inefficient indirect addressing on Reduced Tiny

2024-02-25 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114100

Georg-Johann Lay  changed:

   What|Removed |Added

   Target Milestone|--- |14.0
   Priority|P3  |P4
   Keywords||missed-optimization
 Target||avr

[Bug target/114100] New: [avr] Inefficient indirect addressing on Reduced Tiny

2024-02-25 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114100

Bug ID: 114100
   Summary: [avr] Inefficient indirect addressing on Reduced Tiny
   Product: gcc
   Version: 13.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gjl at gcc dot gnu.org
  Target Milestone: ---

The Reduced Tiny core does not support indirect addressing with offset, which
basically means that every indirect memory access with a size of more than one
byte is effectively POST_INC or PRE_DEC.  The lack of that addressing mode is
currently handled by pretending to support it, and then let the insn printers
add and subtract again offsets as needed on the fly.

For example, the following C code

   int vars[10];

   void inc_var2 (void) {
  ++vars[2];
   }

is compiled to:

   ldi r30,lo8(vars) ;  14   [c=4 l=2]  *movhi/4
   ldi r31,hi8(vars)
   subi r30,lo8(-(4));  15   [c=8 l=6]  *movhi/2
   sbci r31,hi8(-(4))
   ld r20,Z+
   ld r21,Z
   subi r30,lo8((4+1))
   sbci r31,hi8((4+1))
   subi r20,-1 ;  16   [c=4 l=2]  *addhi3_clobber/1
   sbci r21,-1
   subi r30,lo8(-(4+1));  17   [c=4 l=4]  *movhi/3
   sbci r31,hi8(-(4+1))
   st Z,r21
   st -Z,r20

where the code could be:

   ldi r30,lo8(vars+4);  28   [c=4 l=2]  *movhi/4
   ldi r31,hi8(vars+4)
   ld r20,Z+  ;  17   [c=8 l=2]  *movhi/2
   ld r21,Z+
   subi r20,-1;  19   [c=4 l=2]  *addhi3_clobber/1
   sbci r21,-1
   st -Z,r21  ;  30   [c=4 l=2]  *movhi/3
   st -Z,r20

[Bug target/97276] A whole if-block is ignored by avr-gcc 9.3.0

2024-02-20 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97276

Georg-Johann Lay  changed:

   What|Removed |Added

   Last reconfirmed||2024-02-20
 Ever confirmed|0   |1
 Status|UNCONFIRMED |WAITING

[Bug middle-end/113974] Attribute common ignored

2024-02-18 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113974

--- Comment #3 from Georg-Johann Lay  ---
Then the documentation should make that clear that with -fno-data-sections the
object goes in COMM, but with -fdata-sections it does not and the attribute
"common" is ignored.

Better still, the compiler would behave as documented irrespective of
-f[no]-data-sections.

This is an issue of the compiler, not of the assembler.

Presumably clang just copied gcc behaviour back then?

[Bug other/113974] New: Attribute common ignored

2024-02-17 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113974

Bug ID: 113974
   Summary: Attribute common ignored
   Product: gcc
   Version: 13.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: other
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gjl at gcc dot gnu.org
  Target Milestone: ---

__attribute__((common,used))
static int cc;

when this code is compiled with -S -fdata-sections then cc is not put into
.lcomm (and is not .local .comm either):

.section.bss.cc,"aw",@nobits
.align 4
.type   cc, @object
.size   cc, 4
cc:
.zero   4
.ident  "GCC: (GNU) 13.2.1 20231022"

with -fno-data-sections, though, it works as expected:

.local  cc
.comm   cc,4,4

[Bug target/113934] Switch avr to LRA

2024-02-16 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113934

--- Comment #1 from Georg-Johann Lay  ---
What's the LRA way to do LEGITIMIZE_RELOAD_ADDRESS?

[Bug target/113927] [avr-tiny] Sets up a stack-frame even for trivial code

2024-02-15 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113927

Georg-Johann Lay  changed:

   What|Removed |Added

 Resolution|--- |FIXED
   Keywords|missed-optimization |
   Target Milestone|--- |13.3
  Component|other   |target
 Status|UNCONFIRMED |RESOLVED

--- Comment #3 from Georg-Johann Lay  ---
Fixed in v13.3+

[Bug other/113927] New: [avr-tiny] Sets up a stack-frame even for trivial code

2024-02-15 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113927

Bug ID: 113927
   Summary: [avr-tiny] Sets up a stack-frame even for trivial code
   Product: gcc
   Version: 13.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: other
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gjl at gcc dot gnu.org
  Target Milestone: ---

Code like

char func (char c)
{
return c;
}

compiles as expected to

func:
/* prologue: function */
/* frame size = 0 */
/* stack size = 0 */
.L__stack_usage = 0
/* epilogue start */
ret

with  avr-gcc -S -Os -mmcu=attiny26 -da , but for attiny40 (Reduced Tiny with
16 GPRs only) the result is:

func:
push r28
push r29
push __tmp_reg__
in r28,__SP_L__
in r29,__SP_H__
/* prologue: function */
/* frame size = 1 */
/* stack size = 3 */
.L__stack_usage = 3
/* epilogue start */
pop __tmp_reg__
pop r29
pop r28
ret

In .asmcons, i.e. just prior to register allocation, the code reads:

(insn 13 4 2 2 (set (reg:QI 46)
(reg:QI 24 r24 [ c ])) "main.c":2:1 86 {movqi_insn_split}
 (expr_list:REG_DEAD (reg:QI 24 r24 [ c ])
(nil)))
(insn 2 13 3 2 (set (reg/v:QI 44 [ c ])
(reg:QI 46)) "main.c":2:1 86 {movqi_insn_split}
 (expr_list:REG_DEAD (reg:QI 46)
(nil)))
(note 3 2 10 2 NOTE_INSN_FUNCTION_BEG)
(insn 10 3 11 2 (set (reg/i:QI 24 r24)
(reg/v:QI 44 [ c ])) "main.c":4:1 86 {movqi_insn_split}
 (expr_list:REG_DEAD (reg/v:QI 44 [ c ])
(nil)))
(insn 11 10 0 2 (use (reg/i:QI 24 r24)) "main.c":4:1 -1
 (nil))

so everything is fine and this PR is not a dup of PR110093.  According to
Vladimir Makarov, PR110093 is because DFA cannot handle subregs, but the RTL
code above does not have subregs.  What's the case is that IRA has very high
register costs, for example in .ira:

Pass 0 for finding pseudo/allocno costs

a1 (r46,l0) best NO_REGS, allocno NO_REGS
a0 (r44,l0) best NO_REGS, allocno NO_REGS

  a0(r44,l0) costs: POINTER_X_REGS:65535000 POINTER_Y_REGS:65535000
POINTER_Z_REGS:65535000 BASE_POINTER_REGS:65535000 POINTER_REGS:65535000
SIMPLE_LD_REGS:65535000 GENERAL_REGS:65535000 MEM:3000

whereas the .ira for attiny26 (ordinary core with 32 GPRs):

Pass 0 for finding pseudo/allocno costs

a0 (r46,l0) best GENERAL_REGS, allocno GENERAL_REGS

  a0(r46,l0) costs: POINTER_X_REGS:4000 POINTER_Y_REGS:4000 POINTER_Z_REGS:4000
BASE_POINTER_REGS:4000 POINTER_REGS:4000 ADDW_REGS:4000 SIMPLE_LD_REGS:4000
LD_REGS:4000 NO_LD_REGS:4000 GENERAL_REGS:4000 MEM:4000

../../source/gcc-master/configure --target=avr --disable-nls --with-dwarf2
--with-gnu-as --with-gnu-ld --disable-shared --enable-languages=c,c++

[Bug target/105523] Wrong warning array subscript [0] is outside array bounds

2024-02-12 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105523

Georg-Johann Lay  changed:

   What|Removed |Added

   Target Milestone|--- |13.3

--- Comment #37 from Georg-Johann Lay  ---
Back-ported to v13.3

[Bug rtl-optimization/101188] [11/12/13 Regression] [postreload] Uses content of a clobbered register

2024-02-09 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101188

Georg-Johann Lay  changed:

   What|Removed |Added

 Resolution|FIXED   |---
 Status|RESOLVED|REOPENED
Summary|[postreload] Uses content   |[11/12/13 Regression]
   |of a clobbered register |[postreload] Uses content
   ||of a clobbered register

--- Comment #19 from Georg-Johann Lay  ---
Reopened for back-porting.

[Bug target/113824] AVR: ATA5795 in wrong multilib set

2024-02-08 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113824

Georg-Johann Lay  changed:

   What|Removed |Added

   Target Milestone|--- |13.3

[Bug target/113824] AVR: ATA5795 in wrong multilib set

2024-02-08 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113824

Georg-Johann Lay  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #4 from Georg-Johann Lay  ---
Fixed in v12.4 and v13.3+

[Bug target/113824] New: AVR: ATA5795 in wrong multilib set

2024-02-08 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113824

Bug ID: 113824
   Summary: AVR: ATA5795 in wrong multilib set
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gjl at gcc dot gnu.org
  Target Milestone: ---

This device is currently filed in avr5, where according to
https://github.com/avrdudes/avr-libc/issues/874#issuecomment-1933051758 is
should be in avr4.

[Bug target/112944] AVR: Support .rodata in Flash for Devices with FLMAP

2024-02-01 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112944

--- Comment #3 from Georg-Johann Lay  ---
See also the GCC v14 Release Notes:

https://gcc.gnu.org/gcc-14/changes.html#avr

[Bug target/113601] avr: Wrong SRAM start for ATmega3208 and ATmega3209

2024-01-25 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113601

Georg-Johann Lay  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED
   Target Milestone|--- |13.3

--- Comment #4 from Georg-Johann Lay  ---
Fixed in v12.4+ and v13.3+

[Bug target/113601] avr: Wrong SRAM start for ATmega3208 and ATmega3209

2024-01-25 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113601

Georg-Johann Lay  changed:

   What|Removed |Added

 Target||avr
   Priority|P3  |P4

[Bug target/113601] New: avr: Wrong SRAM start for ATmega3208 and ATmega3209

2024-01-25 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113601

Bug ID: 113601
   Summary: avr: Wrong SRAM start for ATmega3208 and ATmega3209
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gjl at gcc dot gnu.org
  Target Milestone: ---

ATmega3208/9 have SRAM from 0x3000 to 0x3fff, which is 4KiB.

The hardware description in avr-mcus.def uses a start at 0x3800, which is not
correct.  This leads to a wrong -Tdata option when linking.

As a work-around, pass -Tdata 0x803000 when linking, or fix the respective
option in device-specs/specs-atmega3208/9.

[Bug debug/113481] avr: internal compiler error in decompose, at rtl.h:2298

2024-01-18 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113481

--- Comment #6 from Georg-Johann Lay  ---
This is as simple as it gets:

void Afun5 (__uint24 num)
{
  __asm volatile ("" :: "r" (num));
}

void func (void)
{
  Afun5 (0xcc);
}

[Bug debug/113481] avr: internal compiler error in decompose, at rtl.h:2298

2024-01-18 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113481

--- Comment #5 from Georg-Johann Lay  ---
Here is a somewhat reduced test:

/* { dg-do run } */
/* { dg-options "-g -O2" } */

typedef __UINT8_TYPE__ uint8_t;
typedef __uint24 uint24_t;

#define BBB 23
#define INC 1
#define VAL 0xcc

uint8_t Afun5 (uint24_t num)
{
  uint8_t b = 0;
  __asm ("sbrc %T1%T2 $ subi %0,%n3"
 : "+d" (b)
 : "r" (num), "n" (BBB), "n" (INC));
  return b;
}

uint8_t Cfun5 (uint24_t num)
{
  uint8_t b = 0;
  if (num & ((uint24_t) 1 << BBB))
b += INC;
  return b;
}

int main (void)
{
  if (Afun5 (VAL) != Cfun5 (VAL))
__builtin_abort();

  return 0;
}

[Bug debug/113481] avr: internal compiler error in decompose, at rtl.h:2298

2024-01-18 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113481

--- Comment #4 from Georg-Johann Lay  ---
What's strange is that it only occurs with -g.

[Bug debug/113481] [14 Regession] avr: internal compiler error in decompose, at rtl.h:2298

2024-01-18 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113481

--- Comment #2 from Georg-Johann Lay  ---
It does not occur with avr-gcc-v13 -fchecking

[Bug debug/113480] [14 Regression] avr: internal compiler error: in add_dwarf_attr, at dwarf2out.cc:4503

2024-01-18 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113480

--- Comment #3 from Georg-Johann Lay  ---
(In reply to Richard Biener from comment #2)
> Does it fail with GCC 13 as well (if you add -fchecking)?

Yes.  And it goes away without -g.

[Bug debug/113481] [14 Regession] avr: internal compiler error in decompose, at rtl.h:2298

2024-01-18 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113481

Georg-Johann Lay  changed:

   What|Removed |Added

Summary|avr: internal compiler  |[14 Regession] avr:
   |error in decompose, at  |internal compiler error in
   |rtl.h:2298  |decompose, at rtl.h:2298
 Target||avr
   Keywords||ice-on-valid-code

--- Comment #1 from Georg-Johann Lay  ---
The test uses __int24, so maybe the assertion gets something wrong with
PSImode.

[Bug debug/113481] New: avr: internal compiler error in decompose, at rtl.h:2298

2024-01-18 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113481

Bug ID: 113481
   Summary: avr: internal compiler error in decompose, at
rtl.h:2298
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: debug
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gjl at gcc dot gnu.org
  Target Milestone: ---

Created attachment 57139
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57139=edit
C test case from gcc.target/avr/torture

I see this ICE with current compiler from master with an avr test case:

$ avr-gcc -g -O3 -std=gnu99 -mmcu=atmega128 pr109907-2.c -S

The ICE goes away when I remove the -g switch.

Target: avr
Configured with: ../../source/gcc-master/configure --target=avr --disable-nls
--with-dwarf2 --with-gnu-as --with-gnu-ld --disable-shared
--with-fixed-point=no --with-long-double=64 --enable-languages=c,c++
Thread model: single
Supported LTO compression algorithms: zlib
gcc version 14.0.1 20240118 (experimental) (GCC)

[Bug debug/113480] [14 Regression] avr: internal compiler error: in add_dwarf_attr, at dwarf2out.cc:4503

2024-01-18 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113480

--- Comment #1 from Georg-Johann Lay  ---
The ICE actially goes away when I remove the __flash from the test case posted
in comment #0.

[Bug debug/113480] New: avr: internal compiler error: in add_dwarf_attr, at dwarf2out.cc:4503

2024-01-18 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113480

Bug ID: 113480
   Summary: avr: internal compiler error: in add_dwarf_attr, at
dwarf2out.cc:4503
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: debug
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gjl at gcc dot gnu.org
  Target Milestone: ---

This is a relatively new ICE that I see on the avr target.

Compliled with "version 14.0.1 20240118 (experimental) (avr)"

$ avr-gcc pr83801.c -g -O3 -std=gnu99 -mmcu=atmega128

/* { dg-options { "-std=gnu99" } } */
/* { dg-do run { target { ! avr_tiny } } } */

__attribute((noinline,noclone))
char to_ascii (unsigned i)
{
static const char __flash code_tab[] = "0123456789";
return code_tab[i];
}

int main()
{
  if (to_ascii (2) != '2')
__builtin_abort();

  return 0;
}

I am not sure if it's related to address-spaces, but there are fails that don't
have ASes like gcc.target/avr/torture/pr109907-2.c (but that has __int24
though).

Target: avr
Configured with: ../../source/gcc-master/configure --target=avr --disable-nls
--with-dwarf2 --with-gnu-as --with-gnu-ld --disable-shared
--with-fixed-point=no  --with-long-double=64 --enable-languages=c,c++
Thread model: single
Supported LTO compression algorithms: zlib
gcc version 14.0.1 20240118 (experimental) (GCC)

[Bug rtl-optimization/56442] postreload uses content of clobbered register

2024-01-18 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56442

--- Comment #4 from Georg-Johann Lay  ---
Maybe this is similar to PR101188.

[Bug target/113156] [11/12/13/14 Regression] AVR build broken due to ICE while compiling libgcc, started with r14-6201-gf0a90c7d7333fc

2024-01-17 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113156

--- Comment #17 from Georg-Johann Lay  ---
(In reply to Andrew Pinski from comment #11)
> The patch is semi-correct

Would you detail on this? What parts for a complete fix are (still) missing?

[Bug target/107201] [avr] -nodevicelib not working for devices -mmcu=avr...

2024-01-16 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107201

--- Comment #6 from Georg-Johann Lay  ---
As a work around, one can use an adjusted device-specs file with the
avrlibc_devicelib removed.  The spec looks like this:

*avrlibc_devicelib:
%{!nodevicelib:-lavr128da32}

[Bug target/107201] [avr] -nodevicelib not working for devices -mmcu=avr...

2024-01-16 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107201

Georg-Johann Lay  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
   Target Milestone|--- |13.3
   Priority|P3  |P4
   Keywords|patch   |
 Resolution|--- |FIXED

--- Comment #5 from Georg-Johann Lay  ---
Fixed in v12.4 and v13.3+.

[Bug target/113156] [11/12/13/14 Regression] AVR build broken due to ICE while compiling libgcc, started with r14-6201-gf0a90c7d7333fc

2024-01-15 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113156

Georg-Johann Lay  changed:

   What|Removed |Added

   Keywords||opt-attribute

--- Comment #15 from Georg-Johann Lay  ---
Should work now on v12.4+ and v13.3+

[Bug other/113399] New: -ffold-mem-offsets should not be a target option

2024-01-15 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113399

Bug ID: 113399
   Summary: -ffold-mem-offsets should not be a target option
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: minor
  Priority: P3
 Component: other
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gjl at gcc dot gnu.org
  Target Milestone: ---

For some reason, -ffold-mem-offsets is a target option and displayed with
--help=target, which it shold not.

> grep -A3 mem-off common.opt
ffold-mem-offsets
Target Bool Var(flag_fold_mem_offsets) Init(1)
Fold instructions calculating memory offsets to the memory access instruction
if possible.

Added in r14-4664

[Bug target/112944] AVR: Support .rodata in Flash for Devices with FLMAP

2024-01-14 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112944

Georg-Johann Lay  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #2 from Georg-Johann Lay  ---
Implemented in v14.

[Bug c/113387] New: __attribute__ does not mix with [[gnu:]]

2024-01-14 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113387

Bug ID: 113387
   Summary: __attribute__ does not mix with [[gnu:]]
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gjl at gcc dot gnu.org
  Target Milestone: ---

The following compiles fine

[[gnu::used]] __attribute__((used)) int x;

while

__attribute__((used)) [[gnu::used]] int x;

bar.c:1:1: warning: 'used' attribute does not apply to types [-Wattributes]
1 | __attribute__((used)) [[gnu::used]] int x;
  | ^
bar.c:1:37: error: expected identifier or '(' before 'int'
1 | __attribute__((used)) [[gnu::used]] int x;
  | ^~~

IMO both should be valid.

[Bug target/113156] [11/12/13/14 Regression] AVR build broken due to ICE while compiling libgcc, started with r14-6201-gf0a90c7d7333fc

2024-01-10 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113156

--- Comment #10 from Georg-Johann Lay  ---
-mdouble and -mlong-double are more complicated than other options because they
depend on each other in way that (to my knowledge) cannot be described by
option properties. For example, in

-mdouble=64 -mlong-double=32

the second option will trigger -mdouble=32 in order to maintain double <= long
double.

This logic is implemented in avr-common.cc::avr_handle_option(), and maybe
that's the reason "Save" is needed?

Anyway, when the actions of avr_handle_option are changed by some attribue
handling, then the result will be wong code.  For example, when the above
options, for whatever reasons, would lead to double=64 longdouble=32 at some
point, wrong code will be generated.

[Bug target/113156] [11/12/13/14 Regression] AVR build broken due to ICE while compiling libgcc, started with r14-6201-gf0a90c7d7333fc

2024-01-10 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113156

--- Comment #8 from Georg-Johann Lay  ---
Is there as comprehensible explanation when option property "Save" is needed?

The internals just state

> Build the cl_target_option structure to hold a copy of the option,
> add the functions cl_target_option_save and cl_target_option_restore
> to save and restore the options.

Which explains what is does, but not when and why it is needed.

For example, -mdouble= and -mlong-double= are multilib options, and not
optimizatoin options, and they whould never change during a program.

Why is "Save" needed for some multilib options and not for others?

Isn't it even a bug to allow multilib options to be changed due to optimization
flags?  A code that uses -mdouble=32 in some places and -mdouble=64 is others
is bogus.

[Bug target/112952] avr: attribute address not working with -fdata-sections -fno-common

2024-01-08 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112952

--- Comment #6 from Georg-Johann Lay  ---
Also fixed on the v12 branch for v12.4+

[Bug target/112952] avr: attribute address not working with -fdata-sections -fno-common

2024-01-08 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112952

Georg-Johann Lay  changed:

   What|Removed |Added

   Priority|P3  |P5
 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED
   Target Milestone|--- |13.3

--- Comment #4 from Georg-Johann Lay  ---
Fixed in v13.3+

[Bug tree-optimization/102725] -fno-builtin leads to call of strlen since r12-4283-g6f966f06146be768

2023-12-17 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102725

--- Comment #12 from Georg-Johann Lay  ---
(In reply to Andrew Pinski from comment #1)
> You need -fno-tree-loop-distribution -fno-tree-loop-distribute-patterns to
> turn it off.  There is another older bug about this for memcpy.

This is also a documentation issue.

Neither from the naming of that option nor from its documentation one can
concluse that it inhibits open-coded-foo -> foo transformations.

And IMO options that disable all builtin generations should be available on
single function basis, like -fno-auto-strlen or whatever.

Moreover, introducing a call to strlen in a function named "strlen" (or with
assembly name "strlen") is not very wise.

[Bug tree-optimization/113049] Compiles to strlen even with -fno-builtin-strlen -fno-optimize-strlen

2023-12-17 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113049

--- Comment #7 from Georg-Johann Lay  ---
(In reply to Andreas Schwab from comment #6)
> That's what -fno-tree-loop-distribute-patterns is for.

So you know the GCC sources and can draw that conclusion.

The documentation of -ftree-loop-distribute-patterns does not relate in any way
to that.  It's impossible to find this option from a problem description.

[Bug tree-optimization/113049] Compiles to strlen even with -fno-builtin-strlen -fno-optimize-strlen

2023-12-17 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113049

--- Comment #5 from Georg-Johann Lay  ---
(In reply to Andreas Schwab from comment #4)
> -fno-builtin-strlen has a different purpose.

So then -fno-builtin should also not work? GCC documentation of -fno-builtin is
the same like for -fno-builtin-function.

At least there should be an option to disable this, e.g. you need it when
building libgcc / libc anyway, you you get silly non-functional libs.

[Bug tree-optimization/113049] Compiles to strlen even with -fno-builtin-strlen -fno-optimize-strlen

2023-12-17 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113049

--- Comment #3 from Georg-Johann Lay  ---
(In reply to Mikael Pettersson from comment #2)
> Does -fno-tree-loop-distribute-patterns work? That's been the go-to for
> disabling similar loop-to-call transformations people have been objecting to.

It works. But even if it does, that's not intuitive, and -fno-builtin-strlen
should work no matter what.

And also, when a function's assembly name is "funcxyz", then the compiler
should never issue a concocted call to "funcxyz", because the assumptions of
the compiler of what "funcxyz" is doing is obviously wrong.

[Bug tree-optimization/113049] Compiles to strlen even with -fno-builtin-strlen -fno-optimize-strlen

2023-12-17 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113049

--- Comment #1 from Georg-Johann Lay  ---
-fno-builtin works, but that seems too much. -fno-builtin-strlen should switch
it off IMO.

[Bug tree-optimization/113049] New: Compiles to strlen even with -fno-builtin-strlen -fno-optimize-strlen

2023-12-17 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113049

Bug ID: 113049
   Summary: Compiles to strlen even with -fno-builtin-strlen
-fno-optimize-strlen
   Product: gcc
   Version: 12.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gjl at gcc dot gnu.org
  Target Milestone: ---

Compile with -S -Os -fno-builtin-strlen -fno-optimize-strlen

typedef __SIZE_TYPE__ size_t;

size_t strlen (const char *text)
{
  size_t len = 0;

  while (*text)
  {
text++;
len++;
  }
  return len;

}

Generated code:

strlen:
b   strlen


Using built-in specs.
COLLECT_GCC=/opt/compiler-explorer/arm/gcc-12.2.0/arm-unknown-linux-gnueabihf/bin/arm-unknown-linux-gnueabihf-gcc
Target: arm-unknown-linux-gnueabihf
Configured with: /opt/.build/arm-unknown-linux-gnueabihf/src/gcc/configure
--build=x86_64-build_pc-linux-gnu --host=x86_64-build_pc-linux-gnu
--target=arm-unknown-linux-gnueabihf
--prefix=/opt/compiler-explorer/arm/gcc-12.2.0/arm-unknown-linux-gnueabihf
--exec_prefix=/opt/compiler-explorer/arm/gcc-12.2.0/arm-unknown-linux-gnueabihf
--with-sysroot=/opt/compiler-explorer/arm/gcc-12.2.0/arm-unknown-linux-gnueabihf/arm-unknown-linux-gnueabihf/sysroot
--enable-languages=c,c++,fortran,d,objc,obj-c++,go --with-arch=armv7-a
--with-fpu=neon --with-float=hard --enable-__cxa_atexit --disable-libmudflap
--enable-libgomp --enable-libssp --enable-libquadmath
--enable-libquadmath-support --enable-libsanitizer --disable-libmpx
--with-gmp=/opt/.build/arm-unknown-linux-gnueabihf/buildtools
--with-mpfr=/opt/.build/arm-unknown-linux-gnueabihf/buildtools
--with-mpc=/opt/.build/arm-unknown-linux-gnueabihf/buildtools
--with-isl=/opt/.build/arm-unknown-linux-gnueabihf/buildtools --enable-lto
--enable-threads=posix --enable-target-optspace --disable-plugin --disable-nls
--enable-tls --disable-multilib
--with-local-prefix=/opt/compiler-explorer/arm/gcc-12.2.0/arm-unknown-linux-gnueabihf/arm-unknown-linux-gnueabihf/sysroot
--enable-long-long --with-mode=thumb
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 12.2.0 (GCC) 

https://godbolt.org/z/En3j6Gvda

Maybe this is same / similar to PR102725.

[Bug target/112944] AVR: Support .rodata in Flash for Devices with FLMAP

2023-12-15 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112944

Georg-Johann Lay  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2023-12-15
 Status|UNCONFIRMED |ASSIGNED

[Bug ipa/92606] [11/12/13 Regression][avr] invalid merge of symbols in progmem and data sections

2023-12-12 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92606

--- Comment #28 from Georg-Johann Lay  ---
(In reply to Richard Biener from comment #25)
> I wonder if it would be possible to set the appropriate address-space when
> parsing the "progmem" attribute in the target?

No, that's not possible. You cannot adjust all uses to also refer to the
different address-space.  And qualifiers and attributes behave quite
differently.

> For ICF (or more generally IPA) there's comp_type_attributes which
> we already check and which dispatches to target code.  We're also
> rejecting differing DECL_ATTRIBUTES:

This would make sense to also use for variables, or better still call a target
hook to reject specific combinations.  Attrs like "used", "unused" etc. should
still be ok, but the back-end knows best, IMO.

And different address-spaces might also work, e.g. when one decl is progmem
(AS0) and the other is __flash (AS1).  So the current fix misses some
opportunities (just to mention it, not that I think it would matter much).

> so I wonder what happens here?  Does AVR not actually add the progmem
> attribute?

It always adds the progmem attribute (but may bail out, e.g. when not "const").

[Bug target/112952] avr: attribute address not working with -fdata-sections -fno-common

2023-12-11 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112952

Georg-Johann Lay  changed:

   What|Removed |Added

 Target||avr
  Known to fail||13.2.0, 8.5.0
   Keywords||wrong-code

--- Comment #1 from Georg-Johann Lay  ---
Compile the follwing test with avr-gcc ... -fdata-sections -fno-common -S

__attribute__((__address__(0)))
char __flmap;

The generates assembly reads:

.global __flmap
.section.bss.__flmap,"aw",@nobits
.type   __flmap, @object
.size   __flmap, 1
__flmap:
.zero   1

but the expected code is:

.globl  __flmap
__flmap = 0

This problem becomes more pronounced as v10 switched from -fcommon to
-fno-common as the default.

[Bug target/112952] New: avr: attribute address not working with -fdata-sections -fno-common

2023-12-11 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112952

Bug ID: 112952
   Summary: avr: attribute address not working with
-fdata-sections -fno-common
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gjl at gcc dot gnu.org
  Target Milestone: ---

[Bug ipa/92606] [11/12/13 Regression][avr] invalid merge of symbols in progmem and data sections

2023-12-10 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92606

--- Comment #24 from Georg-Johann Lay  ---
(In reply to Richard Biener from comment #22)
> Should be fixed on trunk.  Confirmation would be nice (checked x86 only).

Tested with: gcc version 14.0.0 20231210 (experimental) (GCC)

Still fails for the progmem test case from above. It still has
.setxyz,xyz_prog


typedef __UINT16_TYPE__ uint16_t;
typedef __UINT32_TYPE__ uint32_t;

typedef uint32_t T;

#define read_u32(X) \
(__extension__( \
{   \
uint16_t __addr16 = (uint16_t)(X);  \
uint32_t __result;  \
__asm__ __volatile__ ("lpm %A0, Z+" "\n\t"  \
  "lpm %B0, Z+" "\n\t"  \
  "lpm %C0, Z+" "\n\t"  \
  "lpm %D0, Z" "\n\t"   \
  : "=r" (__result), "+z" (__addr16));  \
__result;   \
}))

#define NI __attribute__((noinline,noclone))

__attribute((progmem))
static const T xyz_prog[] = { 123, 123, 123 };

static T xyz[] = { 123, 123, 123 };

volatile int x = 0;

NI void prf (T f)
{
if (f != 123)
__builtin_abort();
}

NI void func_progmem()
{
prf (read_u32 (& xyz_prog[0]));
}

NI void func_ram()
{
prf (xyz[x]);
}

int main()
{
func_progmem();
func_ram();
}

[Bug target/112944] AVR: Support .rodata in Flash for Devices with FLMAP

2023-12-10 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112944

Georg-Johann Lay  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |gjl at gcc dot gnu.org
 Target||avr
   Target Milestone|--- |14.0
   Severity|normal  |enhancement
   Priority|P3  |P4

[Bug target/112944] New: AVR: Support .rodata in Flash for Devices with FLMAP

2023-12-10 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112944

Bug ID: 112944
   Summary: AVR: Support .rodata in Flash for Devices with FLMAP
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gjl at gcc dot gnu.org
  Target Milestone: ---

Devices from the AVR64* and AVR128* families (from avrxmega2 and avrxmega4)
see a 32 KiB portion of their program memory in the RAM address space.

Which 32 KiB segment is visible is determined by the bit-field
NVMCTRL_CTRLB.FLMAP.

This can be used to place .rodata in flash (it's currently located in RAM like
for all other AVR devices, except the ones from avrtiny and avrxmega3).

* The user should be able to chose the old .rodata location by means of
  a command line option like, say -mrodata-in-ram.  This is needed
  to return to the current behaviour with rodata in RAM if desired.

* The user may chose which 32 KiB block holds the rodata section.

* In all cases, the default configurations should work correctly without any
  user interventions / special code, irrespective of -m[no-]rodata-in-ram.

* The multilib structure is unchanged.

Locating .rodata in flash requires new linker description files which is
addressed by Binutils https://sourceware.org/PR31124.

The compiler part of the implementation is to:

* Add new command line options -m[no-]rodata-in-ram and -mflmap.

* Use the new / previous emulations according
  to -mmcu= and-m[no-]rodata-in-ram.

* Diagnose wrong usage of -m[no-]rodata-in-ram.

* Provide built-in macros __AVR_HAVE_FLMAP__ and __AVR_RODATA_IN_RAM__.

* Don't link __do_copy_data for .rodata objects.

[Bug ipa/92606] [11/12/13 Regression][avr] invalid merge of symbols in progmem and data sections

2023-12-10 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92606

--- Comment #23 from Georg-Johann Lay  ---
(In reply to Richard Biener from comment #22)
> Should be fixed on trunk.  Confirmation would be nice (checked x86 only).

For AVR, this does not fix the attribute progmem case (for which it was
originally reported, because that's an attribute and not an address-space.

progmem is still in wide use:

* C++, because g++ does not implement address-spaces (as opposed to
clang/llvm).

* Existing code that uses progmem, which is supposed to work.

* Not all uses cases of progmem can be converted to address-spaces (PR84163),
  e.g. locating literals in program memory like in
  printf_P (PSTR ("Format string in flash: %d\n"), int_value);
  #define PSTR(s) \
  (__extension__({
  static const char __c[] PROGMEM = (s);
  &__c[0];
  }))

A solution would be a target hook like proposed in PR92932.

Or switch off -fno-ipa-icf-variables per default for AVR. Less optimal code is
still better than wrong code...

  1   2   3   4   5   6   7   8   9   10   >