On Tue, Nov 01, 2016 at 11:08:53AM -0700, Andrew Pinski wrote: > On Tue, Nov 17, 2015 at 2:10 PM, Andrew Pinski <apin...@cavium.com> wrote: > > Since ThunderX T88 pass 1 (variant 0) is a ARMv8 part while pass 2 (variant > > 1) > > is an ARMv8.1 part, I needed to add detecting of the variant also for this > > difference. Also I simplify a little bit and combined the single core and > > arch detecting cases so it would be easier to add variant. > > Actually it is a bit more complex than what I said here, see below for > the full table of options and what are enabled/disabled now. > > > OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions. > > Tested -mcpu=native on both T88 pass 1 and T88 pass 2 to make sure it is > > deecting the two seperately. > > > Here is the final patch in this series updated; I changed the cpu name > slightly and made sure I updated invoke.texi too. > > The names are going to match the names in LLVM (worked with our LLVM > engineer here at Cavium about the names). > Here are the names recorded and > -mpcu=thunderx: > * Matches part num 0xA0 (reserved for ThunderX 8x series) > * T88 Pass 2 scheduling > * Hardware prefetching (software prefetching disabled) > * LSE enabled > * no v8.1
This doesn't match the current LLVM proposal ( https://reviews.llvm.org/D24540 ) which enables full ARMv8.1-A support for -mcpu=thunderx. > -mcpu=thunderxt88: > * Matches part num 0xA1 > * T88 Pass 2 scheduling > * software prefetching enabled > * LSE enabled > * no v8.1 > > -mcpu=thunderxt88p1 (only for GCC): > * Matches part num 0xA1, variant 0 > * T88 Pass 1 scheduling > * software prefetching enabled > * no LSE enabled > * no v8.1 > > -mcpu=thunderxt81 and -mcpu=thunderxt83: > * Matches part num 0xA2/0xA3 > * T88 Pass 2 scheduling > * Hardware prefetching (software prefetching disabled) > * LSE enabled > * v8.1 This looks like what has been added to LLVM as -mcpu=thunderx. > I have not hooked up software vs hardware prefetching and the > scheduler parts (the next patch will do part of that); both ARMv8.1-a > and LSE parts are hooked up as those parts are only in > aarch64-cores.def. > > OK? Bootstrapped and tested on ThunderX T88 and ThunderX T81 > (aarch64-linux-gnu). > > Index: common/config/aarch64/aarch64-common.c > =================================================================== > --- common/config/aarch64/aarch64-common.c (revision 241727) > +++ common/config/aarch64/aarch64-common.c (working copy) > @@ -145,7 +145,7 @@ struct arch_to_arch_name > the default set of architectural feature flags they support. */ > static const struct processor_name_to_arch all_cores[] = > { > -#define AARCH64_CORE(NAME, X, IDENT, ARCH_IDENT, FLAGS, COSTS, IMP, PART) \ > +#define AARCH64_CORE(NAME, X, IDENT, ARCH_IDENT, FLAGS, COSTS, IMP, PART, > VARIANT) \ > {NAME, AARCH64_ARCH_##ARCH_IDENT, FLAGS}, > #include "config/aarch64/aarch64-cores.def" > {"generic", AARCH64_ARCH_8A, AARCH64_FL_FOR_ARCH8}, > Index: config/aarch64/aarch64-cores.def > =================================================================== > --- config/aarch64/aarch64-cores.def (revision 241727) > +++ config/aarch64/aarch64-cores.def (working copy) > @@ -21,7 +21,7 @@ > > Before using #include to read this file, define a macro: > > - AARCH64_CORE(CORE_NAME, CORE_IDENT, SCHEDULER_IDENT, ARCH_IDENT, > FLAGS, COSTS, IMP, PART) > + AARCH64_CORE(CORE_NAME, CORE_IDENT, SCHEDULER_IDENT, ARCH_IDENT, > FLAGS, COSTS, IMP, PART, VARIANT) > > The CORE_NAME is the name of the core, represented as a string constant. > The CORE_IDENT is the name of the core, represented as an identifier. > @@ -39,39 +39,45 @@ > PART is the part number of the CPU. On a GNU/Linux system it can be > found in /proc/cpuinfo. For big.LITTLE systems this should use the > macro AARCH64_BIG_LITTLE where the big part number comes as the first > - argument to the macro and little is the second. */ > + argument to the macro and little is the second. > + VARIANT is the variant of the CPU. In a GNU/Linux system it can found > + in /proc/cpuinfo. If this is -1, this means it can match any variant. */ > > /* V8 Architecture Processors. */ > > /* ARM ('A') cores. */ > -AARCH64_CORE("cortex-a35", cortexa35, cortexa53, 8A, AARCH64_FL_FOR_ARCH8 > | AARCH64_FL_CRC, cortexa35, 0x41, 0xd04) > -AARCH64_CORE("cortex-a53", cortexa53, cortexa53, 8A, AARCH64_FL_FOR_ARCH8 > | AARCH64_FL_CRC, cortexa53, 0x41, 0xd03) > -AARCH64_CORE("cortex-a57", cortexa57, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 > | AARCH64_FL_CRC, cortexa57, 0x41, 0xd07) > -AARCH64_CORE("cortex-a72", cortexa72, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 > | AARCH64_FL_CRC, cortexa72, 0x41, 0xd08) > -AARCH64_CORE("cortex-a73", cortexa73, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 > | AARCH64_FL_CRC, cortexa73, 0x41, 0xd09) > +AARCH64_CORE("cortex-a35", cortexa35, cortexa53, 8A, AARCH64_FL_FOR_ARCH8 > | AARCH64_FL_CRC, cortexa35, 0x41, 0xd04, -1) > +AARCH64_CORE("cortex-a53", cortexa53, cortexa53, 8A, AARCH64_FL_FOR_ARCH8 > | AARCH64_FL_CRC, cortexa53, 0x41, 0xd03, -1) > +AARCH64_CORE("cortex-a57", cortexa57, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 > | AARCH64_FL_CRC, cortexa57, 0x41, 0xd07, -1) > +AARCH64_CORE("cortex-a72", cortexa72, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 > | AARCH64_FL_CRC, cortexa72, 0x41, 0xd08, -1) > +AARCH64_CORE("cortex-a73", cortexa73, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 > | AARCH64_FL_CRC, cortexa73, 0x41, 0xd09, -1) > > /* Samsung ('S') cores. */ > -AARCH64_CORE("exynos-m1", exynosm1, exynosm1, 8A, AARCH64_FL_FOR_ARCH8 > | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, exynosm1, 0x53, 0x001) > +AARCH64_CORE("exynos-m1", exynosm1, exynosm1, 8A, AARCH64_FL_FOR_ARCH8 > | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, exynosm1, 0x53, 0x001, -1) > > /* Qualcomm ('Q') cores. */ > -AARCH64_CORE("qdf24xx", qdf24xx, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 > | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, qdf24xx, 0x51, 0x800) > +AARCH64_CORE("qdf24xx", qdf24xx, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 > | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, qdf24xx, 0x51, 0x800, -1) > > /* Cavium ('C') cores. */ > -AARCH64_CORE("thunderx", thunderx, thunderx, 8A, AARCH64_FL_FOR_ARCH8 > | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx, 0x43, 0x0a1) > +AARCH64_CORE("thunderx", thunderx, thunderx, 8A, > AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO | AARCH64_FL_LSE, > thunderx, 0x43, 0x0a0, -1) > +AARCH64_CORE("thunderxt88p1", thunderxt88p1, thunderx, 8A, > AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, > thunderx, 0x43, 0x0a1, 0) > +AARCH64_CORE("thunderxt88", thunderxt88, thunderx, 8A, > AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO | AARCH64_FL_LSE, > thunderx, 0x43, 0x0a1, -1) You probably want a comment somewhere here making it clear that the ordering of thunderxt88p1 and thunderxt88 must remain as is, or detection will fail (-1 will match before 0). Otherwise someone will come along and helpfully put these in alphabetical order and cause you trouble... > +AARCH64_CORE("thunderxt81", thunderxt81, thunderx, 8_1A, > AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO | AARCH64_FL_LSE, > thunderx, 0x43, 0x0a2, -1) > +AARCH64_CORE("thunderxt83", thunderxt83, thunderx, 8_1A, > AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO | AARCH64_FL_LSE, > thunderx, 0x43, 0x0a3, -1) > > /* APM ('P') cores. */ > -AARCH64_CORE("xgene1", xgene1, xgene1, 8A, AARCH64_FL_FOR_ARCH8, > xgene1, 0x50, 0x000) > +AARCH64_CORE("xgene1", xgene1, xgene1, 8A, AARCH64_FL_FOR_ARCH8, > xgene1, 0x50, 0x000, -1) > > /* V8.1 Architecture Processors. */ > > /* Broadcom ('B') cores. */ > -AARCH64_CORE("vulcan", vulcan, cortexa57, 8_1A, AARCH64_FL_FOR_ARCH8_1 | > AARCH64_FL_CRYPTO, vulcan, 0x42, 0x516) > +AARCH64_CORE("vulcan", vulcan, cortexa57, 8_1A, AARCH64_FL_FOR_ARCH8_1 | > AARCH64_FL_CRYPTO, vulcan, 0x42, 0x516, -1) > > /* V8 big.LITTLE implementations. */ > > -AARCH64_CORE("cortex-a57.cortex-a53", cortexa57cortexa53, cortexa53, 8A, > AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa57, 0x41, AARCH64_BIG_LITTLE > (0xd07, 0xd03)) > -AARCH64_CORE("cortex-a72.cortex-a53", cortexa72cortexa53, cortexa53, 8A, > AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa72, 0x41, AARCH64_BIG_LITTLE > (0xd08, 0xd03)) > -AARCH64_CORE("cortex-a73.cortex-a35", cortexa73cortexa35, cortexa53, 8A, > AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa73, 0x41, AARCH64_BIG_LITTLE > (0xd09, 0xd04)) > -AARCH64_CORE("cortex-a73.cortex-a53", cortexa73cortexa53, cortexa53, 8A, > AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa73, 0x41, AARCH64_BIG_LITTLE > (0xd09, 0xd03)) > +AARCH64_CORE("cortex-a57.cortex-a53", cortexa57cortexa53, cortexa53, 8A, > AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa57, 0x41, AARCH64_BIG_LITTLE > (0xd07, 0xd03), -1) > +AARCH64_CORE("cortex-a72.cortex-a53", cortexa72cortexa53, cortexa53, 8A, > AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa72, 0x41, AARCH64_BIG_LITTLE > (0xd08, 0xd03), -1) > +AARCH64_CORE("cortex-a73.cortex-a35", cortexa73cortexa35, cortexa53, 8A, > AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa73, 0x41, AARCH64_BIG_LITTLE > (0xd09, 0xd04), -1) > +AARCH64_CORE("cortex-a73.cortex-a53", cortexa73cortexa53, cortexa53, 8A, > AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa73, 0x41, AARCH64_BIG_LITTLE > (0xd09, 0xd03), -1) Why do variants for big.LITTLE get a single variant number, but you track two variant numbers in the code below? Thanks, James