Since it is a race condition, there are many factors that come into play
that determine if it would cause the SoC to lock up or not, so my statement
that it's a problem with SKUs with more than 4 cores is just a
generalization. Other factors that could impact the timing of the execution
of the SMM relocation code on each core during MP init could result in
masking the problem. Frankly, it's just a fluke that it wasn't caught in the
original development of the Broadwell-DE solution, and only because it can't
be reproduced on the Camelback Mountain CRBs with the stock 4-core SKU that
was in all of them that were built back in the day.

Serialization of the SMM relocation code, which would otherwise run in
parallel on all cores with no serialization, is what masks the race
condition, be it from either turning up the console output such that printks
in the code path are executed, OR by a call to wbinvd() in function
smm_relocation_handler() in file
src/soc/intel/fsp_broadwell_de/smmrelocate.c. But again, this just masks the
underlying root cause.

My point is that the race condition itself exists in the code regardless of
what SKU you are using. The number of cores that execute the SMM relocation
code in parallel just increases the chance of causing a lockup. Our
experience with a 12-core SKU was that it would lock up 100% of the time if
the console level was dialed down and before we introduced the wbinvd()
workaround. And others on this list have reported similar results with other
SKUs with more than 4 cores. But again, when it comes to race conditions,
there are many factors that come into play that could impact whether or not
a lockup occurs.

Looking at the last post code dumped to the console in the original message
before it hung, I immediately recognized that from when we saw the same
hang. That's the last post code you get before the MP init code takes off
and ultimately runs the SMM relocation handler code.

Bottom line is that at some point somebody should investigate the underlying
root cause and fix it right. At the moment, we are busy with other things
and don't have the bandwidth to look into it deeper, but at least we have a
pretty good idea of where the problem lies.

- Jay

Jay Talbott
Principal Consulting Engineer
SysPro Consulting, LLC
3057 E. Muirfield St.
Gilbert, AZ 85298
(480) 704-8045
(480) 445-9895 (FAX)
[email protected]
http://www.sysproconsulting.com

> -----Original Message-----
> From: Frans Hendriks [mailto:[email protected]]
> Sent: Thursday, February 28, 2019 1:13 AM
> To: [email protected]
> Cc: 'Jay Talbott'; [email protected]
> Subject: [coreboot] Re: Intel Xeon D-1577 (16-core)
> 
> This relocation is performed in a later stage, so some output is expected.
> Is (correct) microcode included for D-1577?
> 
> I assume wbinvd() must be included in  smm_relocation() function.
> (We have implemented coreboot on several Broadwell-DE boards with >4
> cores
> without using wbinv() of 8:SPEW.)
> 
> Best regards,
> Frans Hendriks
> Eltan B.V.
> 
> 
> -----Original Message-----
> From: Jay Talbott [mailto:[email protected]]
> Sent: woensdag 27 februari 2019 22:19
> To: [email protected]; [email protected]
> Subject: [coreboot] Re: Intel Xeon D-1577 (16-core)
> 
> There's a bug in the SMM relocation code for Broadwell-DE that causes a
race
> condition resulting in the SoC locking up during the MP init. With the
stock
> 4-core SKU that comes in most of the Camelback Mountain CRBs, it's not a
> problem, which is why Intel didn't find it when they developed the
original
> coreboot implementation for the CRB. But as has been reported previously
> on
> this list, it becomes a problem with more than 4 cores. We actually have
an
> open ticket with Intel to see if they are willing to diagnose the root
cause
> and fix it, but I have zero expectation that any action will ever be done
on
> their part at this point.
> 
> If you turn up your console output level to 8:SPEW, the problem will go
> away, as the extra printks that get enabled in that case result in
> serialization of the SMM relocation code on each core, thus masking the
race
> condition (try this first!).
> 
> Another workaround is to insert a call to wbinvd() in function
> smm_relocation_handler() in file
> src/soc/intel/fsp_broadwell_de/smmrelocate.c. This will also result in
> serialization that masks the race condition (fixed it for us on a 12-core
> SKU) without needing to have the console turned all the way up.
> 
> At some point somebody needs to dig into the actual code in smmrelocate.c
> and identify the root cause of the actual race condition. We just haven't
> had the time to do any further investigation into the root cause since we
> have a working workaround.
> 
> Hope that helps...
> 
> - Jay
> 
> Jay Talbott
> Principal Consulting Engineer
> SysPro Consulting, LLC
> 3057 E. Muirfield St.
> Gilbert, AZ 85298
> (480) 704-8045
> (480) 445-9895 (FAX)
> [email protected]
> http://www.sysproconsulting.com
> 
> 
> > -----Original Message-----
> > From: [email protected] [mailto:[email protected]]
> > Sent: Wednesday, February 27, 2019 12:46 PM
> > To: [email protected]
> > Subject: [coreboot] Intel Xeon D-1577 (16-core)
> >
> > Hello,
> >
> > I got a daughter-card (DC) based on the Intel's Camelback Mountain CRB.
> > Coreboot won't boot when a DC is populated with a 16-core Xeon D-1577
> > processor. Nothing is printed in the boot process, so it doesn't seem to
> be
> > getting very far. However, if i load/program the boot SPI with AMI BIOS
> > instead of coreboot, then everything is hunky-dory. It boots up all the
> way
> > into linux (see below for the platform information when AMI is loaded).
In
> > addition, if DC is populated with a 4-core D-1527 or 2-core D-1508 then
> > coreboot has no issues (see below for info).
> >
> > Is there any configuration that i need to change in coreboot to support
> the D-
> > 1577?
> >
> > thanks!
> >
> > ## AMI BIOS
> > ##
> > @unassigned:~$ inxi -F
> > System:    Host: unassigned Kernel: 4.13-platina-mk1 x86_64 (64 bit)
> >            Console: tty 0 Distro: Debian GNU/Linux 8
> > Machine:   Mobo: Default string model: Default string v: Default string
> >            Bios: American Megatrends v: 5.11 date: 05/31/2017
> > CPU:       Octa core Intel Xeon D-1577 (-HT-MCP-) cache: 24576 KB
> >            Clock Speeds: 1: 1300 MHz 2: 1300 MHz 3: 1300 MHz 4: 1300 MHz
> >            5: 1300 MHz 6: 1300 MHz 7: 1300 MHz 8: 1300 MHz
> > Graphics:  Card: Failed to Detect Video Card!
> >            Display Server: N/A driver: N/A
> >            tty size: 80x24 Advanced Data: N/A out of X
> > Network:   Card-1: Broadcom Device b960
> >            IF: N/A state: N/A speed: N/A duplex: N/A mac: N/A
> >            Card-2: Broadcom Device b960
> >            IF: N/A state: N/A speed: N/A duplex: N/A mac: N/A
> >            Card-3: Intel Device 15ab driver: ixgbe
> >            IF: eth1 state: down mac: 00:a0:c9:00:00:00
> >            Card-4: Intel Device 15ab driver: ixgbe
> >            IF: eth2 state: down mac: 34:12:78:56:01:00
> >            Card-5: Intel I210 Gigabit Network Connection driver: igb
> >            IF: eth0 state: up speed: 1000 Mbps duplex: full
> >            mac: 50:18:4c:00:16:a1
> > Drives:    HDD Total Size: 520.1GB (4.0% used)
> >            ID-1: /dev/sda model: TS512ZBTDM1500T size: 512.1GB
> >            ID-2: USB /dev/sdb model: Echo size: 8.0GB
> > Partition: ID-1: / size: 451G used: 942M (1%) fs: ext4 dev: /dev/sda1
> >            ID-2: swap-1 size: 20.75GB used: 0.00GB (0%) fs: swap dev:
> /dev/sda5
> > RAID:      No RAID devices: /proc/mdstat, md_mod kernel module present
> > Sensors:   System Temperatures: cpu: 56.0C mobo: N/A
> >            Fan Speeds (in rpm): cpu: N/A
> > Info:      Processes: 118 Uptime: 17:49 Memory: 110.9/32087.7MB
> >            Init: systemd runlevel: 5 Client: Shell (bash) inxi: 2.1.28
> >
> > ## Coreboot on 4-core D-1527:
> > ##
> > root@invader0:~# POST: 0x4a
> > romstage_main_continue status: 0  hob_list_ptr: 7f100000
> > FSP Status: 0x0
> > POST: 0x4b
> > POST: 0x4c
> > POST: 0x4d
> > CBMEM:
> > IMD: root @ 7efff000 254 entries.
> > IMD: root @ 7effec00 62 entries.
> > POST: 0x4e
> > CBFS: 'Master Header Locator' located CBFS at [800100:ffffc0)
> > CBFS: Locating 'fallback/ramstage'
> > CBFS: Found @ offset 33b80 size d857
> >
> >
> > coreboot-v0.4-5-g0e4829a5b5 Wed Jun 20 18:38:46 UTC 2018 ramstage
> > starting...
> > POST: 0x39
> > Moving GDT to 7effe9e0...ok
> > POST: 0x80
> >
> > ##
> > root@invader0:~# inxi -F
> > System:    Host: invader0 Kernel: 4.13-platina-mk1 x86_64 (64 bit)
> >            Console: tty 0 Distro: Debian GNU/Linux 8
> > Machine:   Mobo: Intel model: Camelback Mountain Platina DC v: 1.0
serial:
> > 123456789
> >            Bios: coreboot v: v0.4-5-g0e4829a5b5 date: 06/20/2018
> > CPU:       Quad core Intel Xeon D-1527 (-HT-MCP-) cache: 6144 KB
> >            Clock Speeds: 1: 2194 MHz 2: 2194 MHz 3: 2194 MHz 4: 2194 MHz
> >            5: 2194 MHz 6: 2194 MHz 7: 2194 MHz 8: 2194 MHz
> > Graphics:  Card: Failed to Detect Video Card!
> >            Display Server: N/A driver: N/A
> >            tty size: 80x24 Advanced Data: N/A for root out of X
> > Network:   Card-1: Broadcom Device b960 driver: vfio-pci
> >            IF: N/A state: N/A speed: N/A duplex: N/A mac: N/A
> >            Card-2: Broadcom Device b960 driver: vfio-pci
> >            IF: N/A state: N/A speed: N/A duplex: N/A mac: N/A
> >            Card-3: Intel Device 15ab driver: ixgbe
> >            IF: eth1 state: down mac: 50:18:4c:00:16:a2
> >            Card-4: Intel Device 15ab driver: ixgbe
> >            IF: eth2 state: down mac: 50:18:4c:00:16:a3
> >            Card-5: Intel I210 Gigabit Network Connection driver: igb
> >            IF: eth0 state: up speed: 1000 Mbps duplex: full
> >            mac: 50:18:4c:00:16:a1
> > Drives:    HDD Total Size: 136.1GB (5.0% used)
> >            ID-1: /dev/sda model: SanDisk_SD8SMAT1 size: 128.0GB
> >            ID-2: USB /dev/sdb model: Echo size: 8.0GB
> > Partition: ID-1: / size: 98G used: 1.4G (2%) fs: ext4 dev: /dev/sda6
> >            ID-2: swap-1 size: 5.70GB used: 0.00GB (0%) fs: swap dev:
> /dev/sda5
> > RAID:      No RAID devices: /proc/mdstat, md_mod kernel module present
> > Sensors:   System Temperatures: cpu: 56.0C mobo: N/A
> >            Fan Speeds (in rpm): cpu: N/A
> > Info:      Processes: 133 Uptime: 1 min Memory: 149.1/16078.1MB
> >            Init: systemd runlevel: 5 Client: Shell (bash) inxi: 2.1.28
> > root@invader0:~#
> > _______________________________________________
> > coreboot mailing list -- [email protected]
> > To unsubscribe send an email to [email protected]
> _______________________________________________
> coreboot mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
> 
> 
> 
> _______________________________________________
> coreboot mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
_______________________________________________
coreboot mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to