On Mon, Jul 28, 2014 at 12:28:39PM -0400, Josef Bacik wrote: > We have these processors with this Cluster on die feature which shares numa > nodes between cores on different sockets.
Uhm, what?! I know AMD has chips that have two nodes per package, but what you say doesn't make sense. > When booting up we were getting this > error with COD enabled (this is a 4 socket 12 core per CPU box) > > smpboot: Booting Node 0, Processors #1 #2 #3 #4 #5 OK > ------------[ cut here ]------------ > WARNING: at arch/x86/kernel/smpboot.c:324 topology_sane.isra.2+0x6f/0x82() > sched: CPU #6's mc-sibling CPU #0 is not on the same node! [node: 1 != 0]. > Ignoring dependency. > smpboot: Booting Node 1, Processors #6 > Modules linked in: > CPU: 6 PID: 0 Comm: swapper/6 Not tainted 3.10.39-31_fbk12_01013_ga2de9bf #1 > Hardware name: Quanta Leopard-DDR3/Leopard-DDR3, BIOS F06_3A03.08 05/24/2014 > ffffffff810971d4 ffff8802748d3e48 0000000000000009 ffff8802748d3df8 > ffffffff815bba59 ffff8802748d3e38 ffffffff8103b02b ffff8802748d3e28 > 0000000000000001 000000000000b010 0000000000012580 0000000000000000 > Call Trace: > [<ffffffff810971d4>] ? print_modules+0x54/0xa0 > [<ffffffff815bba59>] dump_stack+0x19/0x1b > [<ffffffff8103b02b>] warn_slowpath_common+0x6b/0xa0 > [<ffffffff8103b101>] warn_slowpath_fmt+0x41/0x50 > [<ffffffff815ada56>] topology_sane.isra.2+0x6f/0x82 > [<ffffffff815ade23>] set_cpu_sibling_map+0x380/0x42c > [<ffffffff815adfe7>] start_secondary+0x118/0x19a > ---[ end trace 755dbfb52f761180 ]--- > #7 #8 #9 #10 #11 OK > > and then the /proc/cpuinfo would show "cores: 6" instead of "cores: 12" > because > the sibling map doesn't get set right. Yeah, looks like your topology setup is wrecked alright. > This patch fixes this. No, as you say, this patch just makes the warning go away, you still have a royally fucked topology setup. > Now I realize > this is probably not the correct fix but I'm an FS guy and I don't understand > this stuff. :-) > Looking at the cpuflags with COD on and off there appears to be no > difference. The only difference I can spot is with it on we have 4 numa nodes > and with it off we have 2, but that seems like a flakey check at best to add. > I'm open to suggestions on how to fix this properly. Thanks, Got a link that explains this COD nonsense? Google gets me something about Intel SSSC, but nothing that explains your BIOS? knob. I suspect your BIOS is buggy and doesn't properly modify the CPUID topology data.
pgpOVH9dU5TpC.pgp
Description: PGP signature

