Hi Chris,
i vote for these priorties:
I would add functionality first.
1. code size
2. speed
a fast and not too large memcopy/memset could copy the start and end
separately using byte ops when odd, and copy the remaining data in
beween with word moves.
This is ok for longer copies, the only place where the optimizer is
generating the wrong code appears to be the word (2 byte) memcopy, the 4
byte copy is using mov.b using register indexing, which I think uses
less code space, but maybe slower.
The problem I see that comes about is that the odd/even test has to be
done at run time for the indexed case, so the 2 byte copy/set its not an
option. I could optimize the code for the cases where at compile time we
know the address is even, but I see that as a very small optimization,
if the patch I sent works, Ill have a look at it.
To cope with the indexed mode (where you don't know it its even or odd
at compile time) the byte moves seem to be the only option.
A mov.b takes 6 bytes, and the memset call takes 14 bytes, so using 2
mov.b uses less code than the memset (the 16 bit case). The 32 bit
method (used by -O3), takes 20 bytes, so it takes more code space, and
uses a register, but is much faster.
Here is what the pre-patched compiler generates,
memset(&foo.b1, 0x44, 1);
114e: f2 40 44 00 mov.b #68, &0x0284 ;#0x0044
1152: 84 02
memset(&foo.u, 0x55, 2);
1154: b2 40 55 55 mov #21845, &0x0000 ;#0x5555
1158: 00 00
This generates an interval fixup error at link time.
memset(&foo.u, 0x55, 4);
115a: 2d 42 mov #4, r13 ;r2 As==10
115c: 3e 40 55 00 mov #85, r14 ;#0x0055
1160: 3f 40 85 02 mov #645, r15 ;#0x0285
1164: b0 12 0e 12 call #4622 ;#0x120e
memcpy(&s[5], &s[24], 2);
1168: 92 42 1c 02 mov &0x021c,&0x0000 ;0x021c
116c: 00 00
This generates an interval fixup error at link time.
memcpy(&s[i], &s[27], 2);
116e: 9b 42 00 00 mov &0x0000,516(r11);0x0000
1172: 04 02
This generates an interval fixup error at link time.
memcpy(&s[5], &s[29], 4);
1174: 3b 40 21 02 mov #545, r11 ;#0x0221
1178: f2 4b 09 02 mov.b @r11+, &0x0209 ;
117c: f2 4b 0a 02 mov.b @r11+, &0x020a ;
1180: f2 4b 0b 02 mov.b @r11+, &0x020b ;
1184: f2 4b 0c 02 mov.b @r11+, &0x020c ;
and here is the newly patched gcc generated code,
memset(&foo.b1, 0x44, 1);
114e: f2 40 44 00 mov.b #68, &0x0284 ;#0x0044
1152: 84 02
memset(&foo.u, 0x55, 2);
1154: f2 40 55 00 mov.b #85, &0x0285 ;#0x0055
1158: 85 02
115a: f2 40 55 00 mov.b #85, &0x0286 ;#0x0055
115e: 86 02
memset(&foo.u, 0x55, 4);
1160: 2d 42 mov #4, r13 ;r2 As==10
1162: 3e 40 55 00 mov #85, r14 ;#0x0055
1166: 3f 40 85 02 mov #645, r15 ;#0x0285
116a: b0 12 20 12 call #4640 ;#0x1220
memcpy(&s[5], &s[24], 2);
116e: d2 42 1c 02 mov.b &0x021c,&0x0209 ;0x021c
1172: 09 02
1174: d2 42 1d 02 mov.b &0x021d,&0x020a ;0x021d
1178: 0a 02
memcpy(&s[i], &s[27], 2);
117a: db 42 1f 02 mov.b &0x021f,516(r11);0x021f
117e: 04 02
1180: db 42 20 02 mov.b &0x0220,517(r11);0x0220
1184: 05 02
memcpy(&s[5], &s[29], 4);
1186: 3b 40 21 02 mov #545, r11 ;#0x0221
118a: f2 4b 09 02 mov.b @r11+, &0x0209 ;
118e: f2 4b 0a 02 mov.b @r11+, &0x020a ;
1192: f2 4b 0b 02 mov.b @r11+, &0x020b ;
1196: f2 4b 0c 02 mov.b @r11+, &0x020c ;
Regards,
--
Peter Jansen
Antarctic Division
___________________________________________________________________________
Australian Antarctic Division - Commonwealth of Australia
IMPORTANT: This transmission is intended for the addressee only. If you are not
the
intended recipient, you are notified that use or dissemination of this
communication is
strictly prohibited by Commonwealth law. If you have received this transmission
in error,
please notify the sender immediately by e-mail or by telephoning +61 3 6232
3209 and
DELETE the message.
Visit our web site at http://www.aad.gov.au/
___________________________________________________________________________