Hi all, any ideas, thanks. + gonglei, haibin
On 2016/10/17 15:51, Kefeng Wang wrote: > > > On 2016/10/15 2:36, Andy Lutomirski wrote: >> On Thu, Oct 13, 2016 at 11:14 PM, Kefeng Wang >> <wangkefeng.w...@huawei.com> wrote: >>> Hi all, >>> >>> We met BUG_ON in do_device_not_available(fpu exception handler) when run >>> redhat7 in kvm guest, >>> and there is no special test on this guest, only some network packet >>> receipt and transmission. >>> >>> I checked the new kernel version, found this commit >>> 4ecd16ec7059390b430af34bd8bc3ca2b5dcef9a >>> Author: Andy Lutomirski <l...@kernel.org> >>> Date: Sun Jan 24 14:38:06 2016 -0800 >>> >>> x86/fpu: Fix math emulation in eager fpu mode >>> >>> Systems without an FPU are generally old and therefore use lazy FPU >>> switching. Unsurprisingly, math emulation in eager FPU mode is a >>> bit buggy. Fix it. >>> >>> There were two bugs involving kernel code trying to use the FPU >>> registers in eager mode even if they didn't exist and one BUG_ON() >>> that was incorrect. >>> >>> >>> The BUG_ON() is incorrect, but I have no idea about eager fpu, why the >>> BUG_ON is incorrect? >>> Should we backport the patch to v3.10, or is there some bugs in the >>> qemu-kvm? >>> Any reply will be appreciated. >> >> The BUG_ON was incorrect because you could hit it if FPU emulation was >> enabled. But, unless you explicitly set the "eagerfpu=" option or you >> have some really weird set of cpu flags, old kernels shouldn't have >> hit it. Is the cpuinfo you pasted below from the guest? Also, could >> you attach whatever dmesg has to say about FPU in a crashing guest? >> >> --Andy >> > > Hi Andy and Rik, thanks for your quick response. > > Attach more information, and we have no special configuration for fpu, and > only met this issue once(can't reproduce). > > > 1) cmdline > > BOOT_IMAGE=/vmlinuz-3.10.0-229.20.1.x86_64 root=/dev/vda2 oops=panic > softlockup_panic=1 net.ifnames=0 biosdevname=0 nmi_watchdog=1 selinux=0 > console=tty0 panic=3 > > > 2) virsh dumpxml 3 (only cpu parts) > ----------------------- > <memory unit='KiB'>16000000</memory> > <currentMemory unit='KiB'>16000000</currentMemory> > <vcpu placement='static'>8</vcpu> > <resource> > <partition>/machine</partition> > </resource> > <os> > <type arch='x86_64' machine='pc-i440fx-2.3'>hvm</type> > <boot dev='cdrom'/> > </os> > <features> > <acpi/> > <apic/> > <pae/> > </features> > <cpu mode='host-passthrough'> > <topology sockets='1' cores='8' threads='1'/> > </cpu> > ----------------------- > > 3) The host os cpuinfo > ----------------------- > processor : 0 > vendor_id : GenuineIntel > cpu family : 6 > model : 45 > model name : Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz > stepping : 7 > microcode : 1808 > cpu MHz : 2899.894 > cache size : 20480 KB > physical id : 0 > siblings : 16 > core id : 0 > cpu cores : 8 > apicid : 0 > initial apicid : 0 > fpu : yes > fpu_exception : yes > cpuid level : 13 > wp : yes > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca > cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx > pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good > nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl > vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt > tsc_deadline_timer aes xsave avx lahf_lm arat epb xsaveopt pl > n pts dtherm tpr_shadow vnmi flexpriority ept vpid > bogomips : 5799.78 > clflush size : 64 > cache_alignment : 64 > address sizes : 46 bits physical, 48 bits virtual > power management: > > > > > 4) The guest os cpuinfo > >>> [2] The /proc/cpuinfo shows below(show only the first cpu0), >>> -------------------------------- >>> localhost:~ # cat /proc/cpuinfo >>> processor : 0 >>> vendor_id : GenuineIntel >>> cpu family : 6 >>> model : 45 >>> model name : Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz >>> stepping : 7 >>> microcode : 0x1 >>> cpu MHz : 2899.992 >>> cache size : 4096 KB >>> physical id : 0 >>> siblings : 8 >>> core id : 0 >>> cpu cores : 8 >>> apicid : 0 >>> initial apicid : 0 >>> fpu : yes >>> fpu_exception : yes >>> cpuid level : 13 >>> wp : yes >>> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca >>> cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm >>> constant_tsc arch_perfmon rep_good nopl eagerfpu pni pclmulqdq ssse3 cx16 >>> pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx >>> hypervisor lahf_lm xsaveopt >>> bogomips : 5799.98 >>> clflush size : 64 >>> cache_alignment : 64 >>> address sizes : 42 bits physical, 48 bits virtual >>> power management: >>> >> > > 5) parts of bootmsg > > > [ 0.000000] Booting paravirtualized kernel on KVM > [ 0.000000] setup_percpu: NR_CPUS:5120 nr_cpumask_bits:8 nr_cpu_ids:8 > nr_node_ids:1 > [ 0.000000] PERCPU: Embedded 28 pages/cpu @ffff88040f400000 s82816 r8192 > d23680 u262144 > [ 0.000000] KVM setup async PF for cpu 0 > [ 0.000000] Built 1 zonelists in Zone order, mobility grouping on. Total > pages: 3933317 > [ 0.000000] Policy zone: Normal > [ 0.000000] Kernel command line: > BOOT_IMAGE=/vmlinuz-3.10.0-229.20.1.x86_64 root=/dev/vda2 oops=panic > softlockup_panic=1 net.ifnames=0 biosdevname=0 nmi_watchdog=1 selinux=0 > console=tty0 panic=3 > [ 0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes) > [ 0.000000] xsave: enabled xstate_bv 0x7, cntxt size 0x340 > [ 0.000000] Checking aperture... > [ 0.000000] No AGP bridge found > [ 0.000000] Memory: 15375712k/17032192k available (6255k kernel code, > 1049096k absent, 607384k reserved, 4184k data, 1604k init) > [ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=8, Nodes=1 > [ 0.000000] Hierarchical RCU implementation. > [ 0.000000] RCU restricting CPUs from NR_CPUS=5120 to nr_cpu_ids=8. > [ 0.000000] Experimental no-CBs for all CPUs > [ 0.000000] Experimental no-CBs CPUs: 0-7. > [ 0.000000] NR_IRQS:327936 nr_irqs:744 16 > [ 0.000000] Console: colour VGA+ 80x25 > [ 0.000000] console [tty0] enabled > [ 0.000000] allocated 63963136 bytes of page_cgroup > [ 0.000000] please try 'cgroup_disable=memory' option if you don't want > memory cgroups > [ 0.000000] tsc: Detected 2899.992 MHz processor > [ 0.002000] Calibrating delay loop (skipped) preset value.. 5799.98 > BogoMIPS (lpj=2899992) > [ 0.002006] pid_max: default: 32768 minimum: 301 > [ 0.003029] Security Framework initialized > [ 0.003584] SELinux: Disabled at boot. > [ 0.005623] Dentry cache hash table entries: 2097152 (order: 12, 16777216 > bytes) > [ 0.009697] Inode-cache hash table entries: 1048576 (order: 11, 8388608 > bytes) > [ 0.011961] Mount-cache hash table entries: 4096 > [ 0.012178] Initializing cgroup subsys memory > [ 0.013011] Initializing cgroup subsys devices > [ 0.013573] Initializing cgroup subsys freezer > [ 0.014004] Initializing cgroup subsys net_cls > [ 0.015003] Initializing cgroup subsys blkio > [ 0.015558] Initializing cgroup subsys perf_event > [ 0.016007] Initializing cgroup subsys hugetlb > [ 0.016637] CPU: Physical Processor ID: 0 > [ 0.017003] CPU: Processor Core ID: 0 > [ 0.018004] mce: CPU supports 10 MCE banks > [ 0.018597] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0 > Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0 > tlb_flushall_shift: 6 > [ 0.019169] Freeing SMP alternatives: 24k freed > [ 0.023150] ACPI: Core revision 20130517 > [ 0.024384] ACPI: All ACPI Tables successfully acquired > [ 0.025016] ftrace: allocating 23959 entries in 94 pages > [ 0.031284] Enabling x2apic > [ 0.031884] Enabled x2apic > [ 0.032004] Switched APIC routing to physical x2apic. > [ 0.033629] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1 > [ 0.034002] smpboot: CPU0: Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz (fam: > 06, model: 2d, step > [ 0.035052] Performance Events: 16-deep LBR, SandyBridge events, Intel > PMU driver. > [ 0.037935] perf_event_intel: PEBS disabled due to CPU errata, please > upgrade microcode > [ 0.038005] ... version: 2 > [ 0.038544] ... bit width: 48 > [ 0.039003] ... generic registers: 4 > [ 0.039546] ... value mask: 0000ffffffffffff > [ 0.040003] ... max period: 000000007fffffff > [ 0.040565] ... fixed-purpose events: 3 > [ 0.041002] ... event mask: 000000070000000f > [ 0.042819] NMI watchdog: enabled on all CPUs, permanently consumes one > hw-PMU counter. > [ 0.042791] kvm-clock: cpu 1, msr 4:f889041, secondary cpu clock > [ 0.055016] KVM setup async PF for cpu 1 > [ 0.055025] kvm-clock: cpu 2, msr 4:f889081, secondary cpu clock > [ 0.067015] KVM setup async PF for cpu 2 > [ 0.067022] kvm-clock: cpu 3, msr 4:f8890c1, secondary cpu clock > [ 0.079015] KVM setup async PF for cpu 3 > [ 0.079023] kvm-clock: cpu 4, msr 4:f889101, secondary cpu clock > [ 0.091013] KVM setup async PF for cpu 4 > [ 0.091019] kvm-clock: cpu 5, msr 4:f889141, secondary cpu clock > [ 0.103014] KVM setup async PF for cpu 5 > [ 0.103021] kvm-clock: cpu 6, msr 4:f889181, secondary cpu clock > [ 0.115014] KVM setup async PF for cpu 6 > [ 0.043097] smpboot: Booting Node 0, Processors #1 #2 #3 #4 #5 #6 #7 OK > [ 0.115021] kvm-clock: cpu 7, msr 4:f8891c1, secondary cpu clock > [ 0.127022] Brought up 8 CPUs > [ 0.127015] KVM setup async PF for cpu 7 > [ 0.128005] smpboot: Total of 8 processors activated (46399.87 BogoMIPS) > [ 0.129202] devtmpfs: initialized > > > > . >