On Fri, 2017-08-04 at 17:30 -0500, Segher Boessenkool wrote: > On Fri, Aug 04, 2017 at 08:38:11PM +0000, Wilco Dijkstra wrote: > > Richard Henderson wrote: > > > On 08/04/2017 05:59 AM, Prathamesh Kulkarni wrote: > > > > For i386, it seems strcmp is expanded inline via cmpstr optab > > > > by > > > > expand_builtin_strcmp if one of the strings is constant. Could > > > > we similarly > > > > define cmpstr pattern for AArch64? > > > > > > Certainly that's possible. > > > > I'd suggest to do it as a target independent way, this is not a > > target specific > > optimization and shouldn't be done in the target unless there are > > special > > strcmp instructions. > > See rs6000-string.c; cc:ing Aaron.
I think I'd argue that even if there isn't an instruction that does strcmp (i.e. repz cmpsb) you are still better off with target specific code. For example, on power one needs to be aware of how the different processors deal with unaligned loads and that power7 for example doesn't like overlapping unaligned loads which are fine on power8. Also instructions like cmpb are useful and I don't really see how a target independent expansion would end up using that. > > > For constant strings of small length (upto 3?), I was wondering > > > if it'd be a > > > good idea to manually unroll strcmp loop, similar to __strcmp_* > > > macros in > > > bits/string.h?> > > > For eg in gimple-fold, transform > > > x = __builtin_strcmp(s, "ab") > > > to > > > x = s[0] - 'a'; > > > if (x == 0) > > > { > > > x = s[1] - 'b'; > > > if (x == 0) > > > x = s[2]; > > > } There was code to manually unroll strcmp/strncmp in string/bits/string2.h in GLIBC but that was recently removed: https://sourceware.org/git/?p=glibc.git;a=commit;h=f7db120f67d853e0cfa2 72fa39c8b9be67c0a935 Personally I'm glad to see it go not only because of the expansion into individual comparisons of the first couple characters, but also because it expanded strcmp/strncmp to __builtin_strcmp/strncmp which meant that you could not disable the gcc expansions of strcmp/strncmp by using -fno-builtin-strcmp/strncmp. > > > > If there is already code that does something similar (see comment > > #1 in PR78809), > > it could be easily adapted to handle more cases. > > > > > if (memcmp(s, "ab", 3) != 0) > > > > > > to be implemented with cmp+ccmp+ccmp and one branch. > > > > Even better would be wider loads if you either know the alignment > > of s or it's max size > > (although given the overhead of creating the return value that > > works best for equality). > > All those things are handled there, too. Most things that can really > help > are quite target-specific, I think. Same thing goes for memcmp, you really need to know how the target handles aligned/unaligned and for example on power9 it is handy to be able to use setb to produce the correct SImode result even if you just did a DImode compare of 64 bits. > > > Segher > Aaron -- Aaron Sawdey, Ph.D. acsaw...@linux.vnet.ibm.com 050-2/C113 (507) 253-7520 home: 507/263-0782 IBM Linux Technology Center - PPC Toolchain