Re: [CentOS-virt] Xen C6 kernel 4.9.13 and testing 4.9.15 only reboots.

2017-04-21 Thread Kevin Stange
For some additional context, all my hardware is Supermicro and working
great on 4.9.13 - 26.  I have dom0_max_vcpus=2 because of issues I was
having with deadlocked CPU cores before setting that option on 3.18
kernels.  In my experience setting that value doesn't cause any
detriment to the dom0, which isn't doing most of the work anyway.

These are all the motherboards I'm running the kernel stably on:

Supermicro X8DT3
Supermicro X8DT6
Supermicro X9DRD-iF/LF
Supermicro X9DRT
Supermicro X9SCL/X9SCM

I'm on CentOS 6 across the board.

On 04/21/2017 05:01 AM, Mark L Sung wrote:
> Hu, seems there are still stability issues on the
> "4.9.2-26.el7.x86_64", recently hear many issue related to Supermicro
> board! :-(
> 
> Peace!!!
> 
> On Fri, Apr 21, 2017 at 9:40 AM, Anderson, Dave  > wrote:
> 
> Good news/bad news testing the new kernel on CentOS7 with my now
> notoriously finicky machines:
> 
> Good news: 4.9.23-26.el7 (grabbed today via yum update) isn't any
> worse than 4.9.13-22 was on my xen hosts (as far as I can tell so
> far at least)
> 
> Bad news: It isn't any better than 4.9.13 was for me either, if I
> don't set vcpu limit in the grub/xen config, it still panics like so:
> 
> [6.716016] CPU: Physical Processor ID: 0
> [6.720199] CPU: Processor Core ID: 0
> [6.724046] mce: CPU supports 2 MCE banks
> [6.728239] Last level iTLB entries: 4KB 512, 2MB 8, 4MB 8
> [6.733884] Last level dTLB entries: 4KB 512, 2MB 32, 4MB 32, 1GB 0
> [6.740770] Freeing SMP alternatives memory: 32K
> (821a8000 - 821b)
> [6.750638] ftrace: allocating 34344 entries in 135 pages
> [6.771888] smpboot: Max logical packages: 1
> [6.776363] VPMU disabled by hypervisor.
> [6.780479] Performance Events: SandyBridge events, PMU not
> available due to virtualization, using software events only.
> [6.792237] NMI watchdog: disabled (cpu0): hardware events not
> enabled
> [6.798943] NMI watchdog: Shutting down hard lockup detector on
> all cpus
> [6.805949] installing Xen timer for CPU 1
> [6.810659] installing Xen timer for CPU 2
> [6.815317] installing Xen timer for CPU 3
> [6.819947] installing Xen timer for CPU 4
> [6.824618] installing Xen timer for CPU 5
> [6.829282] installing Xen timer for CPU 6
> [6.833935] installing Xen timer for CPU 7
> [6.838565] installing Xen timer for CPU 8
> [6.843110] smpboot: Package 1 of CPU 8 exceeds BIOS package data 1.
> [6.849475] [ cut here ]
> [6.854091] kernel BUG at arch/x86/kernel/cpu/common.c:997!
> [6.855864] random: fast init done
> [6.863070] invalid opcode:  [#1] SMP
> [6.867088] Modules linked in:
> [6.870168] CPU: 8 PID: 0 Comm: swapper/8 Not tainted
> 4.9.23-26.el7.x86_64 #1
> [6.877298] Hardware name: Supermicro X9DRT/X9DRT, BIOS 3.2a
> 08/04/2015
> [6.883920] task: 880058a6a5c0 task.stack: c900400c
> [6.889840] RIP: e030:[]  []
> identify_secondary_cpu+0x57/0x80
> [6.898756] RSP: e02b:c900400c3f08  EFLAGS: 00010086
> [6.904069] RAX: ffe4 RBX: 88005d80a020 RCX:
> 81e5ffc8
> [6.911201] RDX: 0001 RSI: 0005 RDI:
> 0005
> [6.918335] RBP: c900400c3f18 R08: 00ce R09:
> 
> [6.925466] R10: 0005 R11: 0006 R12:
> 0008
> [6.932599] R13:  R14:  R15:
> 
> [6.939735] FS:  () GS:88005d80()
> knlGS:
> [6.947819] CS:  e033 DS: 002b ES: 002b CR0: 80050033
> [6.953565] CR2:  CR3: 01e07000 CR4:
> 00042660
> [6.960696] Stack:
> [6.962731]  0008  c900400c3f28
> 8104ebce
> [6.970205]  c900400c3f40 81029855 
> c900400c3f50
> [6.977691]  810298d0  
> 
> [6.985164] Call Trace:
> [6.987626]  [] smp_store_cpu_info+0x3e/0x40
> [6.993480]  [] cpu_bringup+0x35/0x90
> [6.998700]  [] cpu_bringup_and_idle+0x20/0x40
> [7.004706] Code: 44 89 e7 ff 50 68 0f b7 93 d2 00 00 00 39 d0 75
> 1c 0f b7 bb da 00 00 00 44 89 e6 e8 e4 02 01 00 85 c0 75 07 5b 41 5c
> 5d c3 0f 0b <0f> 0b 0f b7 8b d4 00 00 00 89 c2 44 89 e6 48 c7 c7 90
> d3 ca 81
> [7.024976] RIP  []
> identify_secondary_cpu+0x57/0x80
> [7.031528]  RSP 
> [7.035032] ---[ end trace f2a8d75941398d9f ]---
> [7.039658] Kernel panic - not syncing: 

Re: [CentOS-virt] Xen C6 kernel 4.9.13 and testing 4.9.15 only reboots.

2017-04-21 Thread Mark L Sung
Hu, seems there are still stability issues on the
"4.9.2-26.el7.x86_64", recently hear many issue related to Supermicro
board! :-(

Peace!!!

On Fri, Apr 21, 2017 at 9:40 AM, Anderson, Dave 
wrote:

> Good news/bad news testing the new kernel on CentOS7 with my now
> notoriously finicky machines:
>
> Good news: 4.9.23-26.el7 (grabbed today via yum update) isn't any worse
> than 4.9.13-22 was on my xen hosts (as far as I can tell so far at least)
>
> Bad news: It isn't any better than 4.9.13 was for me either, if I don't
> set vcpu limit in the grub/xen config, it still panics like so:
>
> [6.716016] CPU: Physical Processor ID: 0
> [6.720199] CPU: Processor Core ID: 0
> [6.724046] mce: CPU supports 2 MCE banks
> [6.728239] Last level iTLB entries: 4KB 512, 2MB 8, 4MB 8
> [6.733884] Last level dTLB entries: 4KB 512, 2MB 32, 4MB 32, 1GB 0
> [6.740770] Freeing SMP alternatives memory: 32K (821a8000 -
> 821b)
> [6.750638] ftrace: allocating 34344 entries in 135 pages
> [6.771888] smpboot: Max logical packages: 1
> [6.776363] VPMU disabled by hypervisor.
> [6.780479] Performance Events: SandyBridge events, PMU not available
> due to virtualization, using software events only.
> [6.792237] NMI watchdog: disabled (cpu0): hardware events not enabled
> [6.798943] NMI watchdog: Shutting down hard lockup detector on all cpus
> [6.805949] installing Xen timer for CPU 1
> [6.810659] installing Xen timer for CPU 2
> [6.815317] installing Xen timer for CPU 3
> [6.819947] installing Xen timer for CPU 4
> [6.824618] installing Xen timer for CPU 5
> [6.829282] installing Xen timer for CPU 6
> [6.833935] installing Xen timer for CPU 7
> [6.838565] installing Xen timer for CPU 8
> [6.843110] smpboot: Package 1 of CPU 8 exceeds BIOS package data 1.
> [6.849475] [ cut here ]
> [6.854091] kernel BUG at arch/x86/kernel/cpu/common.c:997!
> [6.855864] random: fast init done
> [6.863070] invalid opcode:  [#1] SMP
> [6.867088] Modules linked in:
> [6.870168] CPU: 8 PID: 0 Comm: swapper/8 Not tainted
> 4.9.23-26.el7.x86_64 #1
> [6.877298] Hardware name: Supermicro X9DRT/X9DRT, BIOS 3.2a 08/04/2015
> [6.883920] task: 880058a6a5c0 task.stack: c900400c
> [6.889840] RIP: e030:[]  []
> identify_secondary_cpu+0x57/0x80
> [6.898756] RSP: e02b:c900400c3f08  EFLAGS: 00010086
> [6.904069] RAX: ffe4 RBX: 88005d80a020 RCX:
> 81e5ffc8
> [6.911201] RDX: 0001 RSI: 0005 RDI:
> 0005
> [6.918335] RBP: c900400c3f18 R08: 00ce R09:
> 
> [6.925466] R10: 0005 R11: 0006 R12:
> 0008
> [6.932599] R13:  R14:  R15:
> 
> [6.939735] FS:  () GS:88005d80()
> knlGS:
> [6.947819] CS:  e033 DS: 002b ES: 002b CR0: 80050033
> [6.953565] CR2:  CR3: 01e07000 CR4:
> 00042660
> [6.960696] Stack:
> [6.962731]  0008  c900400c3f28
> 8104ebce
> [6.970205]  c900400c3f40 81029855 
> c900400c3f50
> [6.977691]  810298d0  
> 
> [6.985164] Call Trace:
> [6.987626]  [] smp_store_cpu_info+0x3e/0x40
> [6.993480]  [] cpu_bringup+0x35/0x90
> [6.998700]  [] cpu_bringup_and_idle+0x20/0x40
> [7.004706] Code: 44 89 e7 ff 50 68 0f b7 93 d2 00 00 00 39 d0 75 1c 0f
> b7 bb da 00 00 00 44 89 e6 e8 e4 02 01 00 85 c0 75 07 5b 41 5c 5d c3 0f 0b
> <0f> 0b 0f b7 8b d4 00 00 00 89 c2 44 89 e6 48 c7 c7 90 d3 ca 81
> [7.024976] RIP  [] identify_secondary_cpu+0x57/0x80
> [7.031528]  RSP 
> [7.035032] ---[ end trace f2a8d75941398d9f ]---
> [7.039658] Kernel panic - not syncing: Attempted to kill the idle task!
>
> So...other than my work around...that still works...not sure what else I
> can provide in the way of feedback/testing. But if you want anything else
> gathered, let me know.
>
> Thanks,
> -Dave
>
> --
> Dave Anderson
>
>
> > On Apr 19, 2017, at 10:33 AM, Johnny Hughes  wrote:
> >
> > On 04/19/2017 12:18 PM, PJ Welsh wrote:
> >>
> >> On Wed, Apr 19, 2017 at 5:40 AM, Johnny Hughes  >> > wrote:
> >>
> >>On 04/18/2017 12:39 PM, PJ Welsh wrote:
> >>> Here is something interesting... I went through the BIOS options and
> >>> found that one R710 that *is* functioning only differed in that
> "Logical
> >>> Processor"/Hyperthreading was *enabled* while the one that is *not*
> >>> functioning had HT *disabled*. Enabled Logical Processor and the system
> >>> starts without issue! I've rebooted 3 times now without issue.
> >>> Dell R710 BIOS version 6.4.0
> >>> 2x 

Re: [CentOS-virt] Xen C6 kernel 4.9.13 and testing 4.9.15 only reboots.

2017-04-20 Thread Anderson, Dave
Good news/bad news testing the new kernel on CentOS7 with my now notoriously 
finicky machines:

Good news: 4.9.23-26.el7 (grabbed today via yum update) isn't any worse than 
4.9.13-22 was on my xen hosts (as far as I can tell so far at least)

Bad news: It isn't any better than 4.9.13 was for me either, if I don't set 
vcpu limit in the grub/xen config, it still panics like so:

[6.716016] CPU: Physical Processor ID: 0
[6.720199] CPU: Processor Core ID: 0
[6.724046] mce: CPU supports 2 MCE banks
[6.728239] Last level iTLB entries: 4KB 512, 2MB 8, 4MB 8
[6.733884] Last level dTLB entries: 4KB 512, 2MB 32, 4MB 32, 1GB 0
[6.740770] Freeing SMP alternatives memory: 32K (821a8000 - 
821b)
[6.750638] ftrace: allocating 34344 entries in 135 pages
[6.771888] smpboot: Max logical packages: 1
[6.776363] VPMU disabled by hypervisor.
[6.780479] Performance Events: SandyBridge events, PMU not available due to 
virtualization, using software events only.
[6.792237] NMI watchdog: disabled (cpu0): hardware events not enabled
[6.798943] NMI watchdog: Shutting down hard lockup detector on all cpus
[6.805949] installing Xen timer for CPU 1
[6.810659] installing Xen timer for CPU 2
[6.815317] installing Xen timer for CPU 3
[6.819947] installing Xen timer for CPU 4
[6.824618] installing Xen timer for CPU 5
[6.829282] installing Xen timer for CPU 6
[6.833935] installing Xen timer for CPU 7
[6.838565] installing Xen timer for CPU 8
[6.843110] smpboot: Package 1 of CPU 8 exceeds BIOS package data 1.
[6.849475] [ cut here ]
[6.854091] kernel BUG at arch/x86/kernel/cpu/common.c:997!
[6.855864] random: fast init done
[6.863070] invalid opcode:  [#1] SMP
[6.867088] Modules linked in:
[6.870168] CPU: 8 PID: 0 Comm: swapper/8 Not tainted 4.9.23-26.el7.x86_64 #1
[6.877298] Hardware name: Supermicro X9DRT/X9DRT, BIOS 3.2a 08/04/2015
[6.883920] task: 880058a6a5c0 task.stack: c900400c
[6.889840] RIP: e030:[]  [] 
identify_secondary_cpu+0x57/0x80
[6.898756] RSP: e02b:c900400c3f08  EFLAGS: 00010086
[6.904069] RAX: ffe4 RBX: 88005d80a020 RCX: 81e5ffc8
[6.911201] RDX: 0001 RSI: 0005 RDI: 0005
[6.918335] RBP: c900400c3f18 R08: 00ce R09: 
[6.925466] R10: 0005 R11: 0006 R12: 0008
[6.932599] R13:  R14:  R15: 
[6.939735] FS:  () GS:88005d80() 
knlGS:
[6.947819] CS:  e033 DS: 002b ES: 002b CR0: 80050033
[6.953565] CR2:  CR3: 01e07000 CR4: 00042660
[6.960696] Stack:
[6.962731]  0008  c900400c3f28 
8104ebce
[6.970205]  c900400c3f40 81029855  
c900400c3f50
[6.977691]  810298d0   

[6.985164] Call Trace:
[6.987626]  [] smp_store_cpu_info+0x3e/0x40
[6.993480]  [] cpu_bringup+0x35/0x90
[6.998700]  [] cpu_bringup_and_idle+0x20/0x40
[7.004706] Code: 44 89 e7 ff 50 68 0f b7 93 d2 00 00 00 39 d0 75 1c 0f b7 
bb da 00 00 00 44 89 e6 e8 e4 02 01 00 85 c0 75 07 5b 41 5c 5d c3 0f 0b <0f> 0b 
0f b7 8b d4 00 00 00 89 c2 44 89 e6 48 c7 c7 90 d3 ca 81 
[7.024976] RIP  [] identify_secondary_cpu+0x57/0x80
[7.031528]  RSP 
[7.035032] ---[ end trace f2a8d75941398d9f ]---
[7.039658] Kernel panic - not syncing: Attempted to kill the idle task!

So...other than my work around...that still works...not sure what else I can 
provide in the way of feedback/testing. But if you want anything else gathered, 
let me know.

Thanks,
-Dave

--
Dave Anderson


> On Apr 19, 2017, at 10:33 AM, Johnny Hughes  wrote:
> 
> On 04/19/2017 12:18 PM, PJ Welsh wrote:
>> 
>> On Wed, Apr 19, 2017 at 5:40 AM, Johnny Hughes > > wrote:
>> 
>>On 04/18/2017 12:39 PM, PJ Welsh wrote:
>>> Here is something interesting... I went through the BIOS options and
>>> found that one R710 that *is* functioning only differed in that "Logical
>>> Processor"/Hyperthreading was *enabled* while the one that is *not*
>>> functioning had HT *disabled*. Enabled Logical Processor and the system
>>> starts without issue! I've rebooted 3 times now without issue.
>>> Dell R710 BIOS version 6.4.0
>>> 2x Intel(R) Xeon(R) CPU L5639  @ 2.13GHz
>>> 4.9.20-26.el7.x86_64 #1 SMP Tue Apr 4 11:19:26 CDT 2017 x86_64 x86_64
>>> x86_64 GNU/Linux
>>> 
>> 
>>Outstanding .. I have now released a 4.9.23-26.el6 and .el7 to the
>>system as normal updates.  It should be available later today.
>> 
>>
>> 
>> 
>> I've verified with a second Dell R710 that disabling
>> Hyperthreading/Logical Processor causes the 

Re: [CentOS-virt] Xen C6 kernel 4.9.13 and testing 4.9.15 only reboots.

2017-04-19 Thread Johnny Hughes
On 04/19/2017 12:18 PM, PJ Welsh wrote:
> 
> On Wed, Apr 19, 2017 at 5:40 AM, Johnny Hughes  > wrote:
> 
> On 04/18/2017 12:39 PM, PJ Welsh wrote:
> > Here is something interesting... I went through the BIOS options and
> > found that one R710 that *is* functioning only differed in that "Logical
> > Processor"/Hyperthreading was *enabled* while the one that is *not*
> > functioning had HT *disabled*. Enabled Logical Processor and the system
> > starts without issue! I've rebooted 3 times now without issue.
> > Dell R710 BIOS version 6.4.0
> > 2x Intel(R) Xeon(R) CPU L5639  @ 2.13GHz
> > 4.9.20-26.el7.x86_64 #1 SMP Tue Apr 4 11:19:26 CDT 2017 x86_64 x86_64
> > x86_64 GNU/Linux
> >
> 
> Outstanding .. I have now released a 4.9.23-26.el6 and .el7 to the
> system as normal updates.  It should be available later today.
> 
> 
> 
>  
> I've verified with a second Dell R710 that disabling
> Hyperthreading/Logical Processor causes the primary xen booting kernel
> to fail and reboot. Consequently, enabling allows for the system to
> start as expected and without any issue:
> Current tested kernel was: 4.9.13-22.el7.x86_64 #1 SMP Sun Feb 26
> 22:15:59 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
> 
> I just attempted an update and the 4.9.23-26 is not yet up. Does this
> update address the Hyperthreading issue in any way?
> 

I don't think so .. at least I did not specifically add anything to do so.

You can get it here for testing:

https://buildlogs.centos.org/centos/7/virt/x86_64/xen/

(or from /6/ as well for CentOS-6)

Not sure why it did not go out on the signing run .. will check that server.





signature.asc
Description: OpenPGP digital signature
___
CentOS-virt mailing list
CentOS-virt@centos.org
https://lists.centos.org/mailman/listinfo/centos-virt


Re: [CentOS-virt] Xen C6 kernel 4.9.13 and testing 4.9.15 only reboots.

2017-04-19 Thread PJ Welsh
On Wed, Apr 19, 2017 at 5:40 AM, Johnny Hughes  wrote:

> On 04/18/2017 12:39 PM, PJ Welsh wrote:
> > Here is something interesting... I went through the BIOS options and
> > found that one R710 that *is* functioning only differed in that "Logical
> > Processor"/Hyperthreading was *enabled* while the one that is *not*
> > functioning had HT *disabled*. Enabled Logical Processor and the system
> > starts without issue! I've rebooted 3 times now without issue.
> > Dell R710 BIOS version 6.4.0
> > 2x Intel(R) Xeon(R) CPU L5639  @ 2.13GHz
> > 4.9.20-26.el7.x86_64 #1 SMP Tue Apr 4 11:19:26 CDT 2017 x86_64 x86_64
> > x86_64 GNU/Linux
> >
>
> Outstanding .. I have now released a 4.9.23-26.el6 and .el7 to the
> system as normal updates.  It should be available later today.
>
> 
>
>
I've verified with a second Dell R710 that disabling Hyperthreading/Logical
Processor causes the primary xen booting kernel to fail and reboot.
Consequently, enabling allows for the system to start as expected and
without any issue:
Current tested kernel was: 4.9.13-22.el7.x86_64 #1 SMP Sun Feb 26 22:15:59
UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

I just attempted an update and the 4.9.23-26 is not yet up. Does this
update address the Hyperthreading issue in any way?

Thanks
PJ
___
CentOS-virt mailing list
CentOS-virt@centos.org
https://lists.centos.org/mailman/listinfo/centos-virt


Re: [CentOS-virt] Xen C6 kernel 4.9.13 and testing 4.9.15 only reboots.

2017-04-18 Thread PJ Welsh
Here is something interesting... I went through the BIOS options and found
that one R710 that *is* functioning only differed in that "Logical
Processor"/Hyperthreading was *enabled* while the one that is *not*
functioning had HT *disabled*. Enabled Logical Processor and the system
starts without issue! I've rebooted 3 times now without issue.
Dell R710 BIOS version 6.4.0
2x Intel(R) Xeon(R) CPU L5639  @ 2.13GHz
4.9.20-26.el7.x86_64 #1 SMP Tue Apr 4 11:19:26 CDT 2017 x86_64 x86_64
x86_64 GNU/Linux



On Tue, Apr 18, 2017 at 8:44 AM, PJ Welsh  wrote:

> Apologies: I installed the newer -26 kernel and had not rebooted into it.
> The grub2 menu item should have been "CentOS Linux (4.9.20-25.el7.x86_64) 7
> (Core)". I am currently restarting that remote affected system (unmodified
> grub2 entry first).
> Thanks
> PJ
>
> On Tue, Apr 18, 2017 at 8:39 AM, PJ Welsh  wrote:
>
>> Just to note, the same pattern happens on C7:
>> "CentOS Linux, with Xen hypervisor" = reboot
>> "CentOS Linux (4.9.20-26.el7.x86_64) 7 (Core)" = boot
>>
>> [root@XXX ~]# uname -a
>> Linux XXX 4.9.20-25.el7.x86_64 #1 SMP Fri Mar 31 08:53:28 CDT 2017 x86_64
>> x86_64 x86_64
>>
>> On Tue, Apr 18, 2017 at 8:36 AM, PJ Welsh  wrote:
>>
>>> There was a note that the non-Xen kernel at the same kernel version did
>>> indeed boot:
>>> "CentOS-6 4.9.20-26 kernel exhibits the same constant
>>> kernel-start-then-reboot issue when booting under the "CentOS Linux, with
>>> Xen hypervisor" grub2 menu option. However, it *does* properly boot under
>>> the "CentOS Linux (4.9.20-25.el7.x86_64) 7 (Core)" grub2 menu option!"
>>>
>>> Trying to get back into being able to test this more.
>>>
>>> Thanks
>>> PJ
>>>
>>> On Tue, Apr 18, 2017 at 8:30 AM, Johnny Hughes 
>>> wrote:
>>>
 On 04/14/2017 03:26 PM, Anderson, Dave wrote:
 > Sad to say that I already tested 4.9.20-26 from your repo
 yesterday...it does look a little cleaner before it dies, but still dies. I
 have not tested it with the vcpu=4 wokaround, but I can tonight if you
 would like. Relevant bits below:
 >
 > Loading Xen 4.6.3-12.el7 ...
 > Loading Linux 4.9.20-26.el7.x86_64 ...
 > Loading initial ramdisk ...
 > [0.00] Linux version 4.9.20-26.el7.x86_64 (mockbuild@) (gcc
 version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Tue Apr 4 11:19:26
 CDT 2017
 >
 > 
 >
 > [6.195089] smpboot: Max logical packages: 1
 > [6.199549] VPMU disabled by hypervisor.
 > [6.203663] Performance Events: SandyBridge events, PMU not
 available due to virtualization, using software events only.
 > [6.215436] NMI watchdog: disabled (cpu0): hardware events not
 enabled
 > [6.222139] NMI watchdog: Shutting down hard lockup detector on
 all cpus
 > [6.229165] installing Xen timer for CPU 1
 > [6.233849] installing Xen timer for CPU 2
 > [6.238504] installing Xen timer for CPU 3
 > [6.243139] installing Xen timer for CPU 4
 > [6.247836] installing Xen timer for CPU 5
 > [6.252478] installing Xen timer for CPU 6
 > [6.257155] installing Xen timer for CPU 7
 > [6.261795] installing Xen timer for CPU 8
 > [6.266358] smpboot: Package 1 of CPU 8 exceeds BIOS package data
 1.
 > [6.272736] [ cut here ]
 > [6.277358] kernel BUG at arch/x86/kernel/cpu/common.c:997!
 > [6.280104] random: fast init done
 > [6.286333] invalid opcode:  [#1] SMP
 > [6.290343] Modules linked in:
 > [6.293430] CPU: 8 PID: 0 Comm: swapper/8 Not tainted
 4.9.20-26.el7.x86_64 #1
 > [6.300568] Hardware name: Supermicro X9DRT/X9DRT, BIOS 3.2a
 08/04/2015
 > [6.307183] task: 880058a68000 task.stack: c900400c
 > [6.313103] RIP: e030:[]  []
 identify_secondary_cpu+0x57/0x80
 > [6.322019] RSP: e02b:c900400c3f08  EFLAGS: 00010086
 > [6.327333] RAX: ffe4 RBX: 88005d80a020 RCX:
 81e5ffc8
 > [6.334473] RDX: 0001 RSI: 0005 RDI:
 0005
 > [6.341607] RBP: c900400c3f18 R08: 00ce R09:
 
 > [6.348738] R10: 0005 R11: 0006 R12:
 0008
 > [6.355873] R13:  R14:  R15:
 
 > [6.363006] FS:  () GS:88005d80()
 knlGS:
 > [6.371090] CS:  e033 DS: 002b ES: 002b CR0: 80050033
 > [6.376837] CR2:  CR3: 01e07000 CR4:
 00042660
 > [6.383970] Stack:
 > [6.386004]  0008  c900400c3f28
 8104ebce
 > [6.393483]  c900400c3f40 81029855 
 c900400c3f50

Re: [CentOS-virt] Xen C6 kernel 4.9.13 and testing 4.9.15 only reboots.

2017-04-18 Thread PJ Welsh
Apologies: I installed the newer -26 kernel and had not rebooted into it.
The grub2 menu item should have been "CentOS Linux (4.9.20-25.el7.x86_64) 7
(Core)". I am currently restarting that remote affected system (unmodified
grub2 entry first).
Thanks
PJ

On Tue, Apr 18, 2017 at 8:39 AM, PJ Welsh  wrote:

> Just to note, the same pattern happens on C7:
> "CentOS Linux, with Xen hypervisor" = reboot
> "CentOS Linux (4.9.20-26.el7.x86_64) 7 (Core)" = boot
>
> [root@XXX ~]# uname -a
> Linux XXX 4.9.20-25.el7.x86_64 #1 SMP Fri Mar 31 08:53:28 CDT 2017 x86_64
> x86_64 x86_64
>
> On Tue, Apr 18, 2017 at 8:36 AM, PJ Welsh  wrote:
>
>> There was a note that the non-Xen kernel at the same kernel version did
>> indeed boot:
>> "CentOS-6 4.9.20-26 kernel exhibits the same constant
>> kernel-start-then-reboot issue when booting under the "CentOS Linux, with
>> Xen hypervisor" grub2 menu option. However, it *does* properly boot under
>> the "CentOS Linux (4.9.20-25.el7.x86_64) 7 (Core)" grub2 menu option!"
>>
>> Trying to get back into being able to test this more.
>>
>> Thanks
>> PJ
>>
>> On Tue, Apr 18, 2017 at 8:30 AM, Johnny Hughes  wrote:
>>
>>> On 04/14/2017 03:26 PM, Anderson, Dave wrote:
>>> > Sad to say that I already tested 4.9.20-26 from your repo
>>> yesterday...it does look a little cleaner before it dies, but still dies. I
>>> have not tested it with the vcpu=4 wokaround, but I can tonight if you
>>> would like. Relevant bits below:
>>> >
>>> > Loading Xen 4.6.3-12.el7 ...
>>> > Loading Linux 4.9.20-26.el7.x86_64 ...
>>> > Loading initial ramdisk ...
>>> > [0.00] Linux version 4.9.20-26.el7.x86_64 (mockbuild@) (gcc
>>> version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Tue Apr 4 11:19:26
>>> CDT 2017
>>> >
>>> > 
>>> >
>>> > [6.195089] smpboot: Max logical packages: 1
>>> > [6.199549] VPMU disabled by hypervisor.
>>> > [6.203663] Performance Events: SandyBridge events, PMU not
>>> available due to virtualization, using software events only.
>>> > [6.215436] NMI watchdog: disabled (cpu0): hardware events not
>>> enabled
>>> > [6.222139] NMI watchdog: Shutting down hard lockup detector on all
>>> cpus
>>> > [6.229165] installing Xen timer for CPU 1
>>> > [6.233849] installing Xen timer for CPU 2
>>> > [6.238504] installing Xen timer for CPU 3
>>> > [6.243139] installing Xen timer for CPU 4
>>> > [6.247836] installing Xen timer for CPU 5
>>> > [6.252478] installing Xen timer for CPU 6
>>> > [6.257155] installing Xen timer for CPU 7
>>> > [6.261795] installing Xen timer for CPU 8
>>> > [6.266358] smpboot: Package 1 of CPU 8 exceeds BIOS package data 1.
>>> > [6.272736] [ cut here ]
>>> > [6.277358] kernel BUG at arch/x86/kernel/cpu/common.c:997!
>>> > [6.280104] random: fast init done
>>> > [6.286333] invalid opcode:  [#1] SMP
>>> > [6.290343] Modules linked in:
>>> > [6.293430] CPU: 8 PID: 0 Comm: swapper/8 Not tainted
>>> 4.9.20-26.el7.x86_64 #1
>>> > [6.300568] Hardware name: Supermicro X9DRT/X9DRT, BIOS 3.2a
>>> 08/04/2015
>>> > [6.307183] task: 880058a68000 task.stack: c900400c
>>> > [6.313103] RIP: e030:[]  []
>>> identify_secondary_cpu+0x57/0x80
>>> > [6.322019] RSP: e02b:c900400c3f08  EFLAGS: 00010086
>>> > [6.327333] RAX: ffe4 RBX: 88005d80a020 RCX:
>>> 81e5ffc8
>>> > [6.334473] RDX: 0001 RSI: 0005 RDI:
>>> 0005
>>> > [6.341607] RBP: c900400c3f18 R08: 00ce R09:
>>> 
>>> > [6.348738] R10: 0005 R11: 0006 R12:
>>> 0008
>>> > [6.355873] R13:  R14:  R15:
>>> 
>>> > [6.363006] FS:  () GS:88005d80()
>>> knlGS:
>>> > [6.371090] CS:  e033 DS: 002b ES: 002b CR0: 80050033
>>> > [6.376837] CR2:  CR3: 01e07000 CR4:
>>> 00042660
>>> > [6.383970] Stack:
>>> > [6.386004]  0008  c900400c3f28
>>> 8104ebce
>>> > [6.393483]  c900400c3f40 81029855 
>>> c900400c3f50
>>> > [6.400963]  810298d0  
>>> 
>>> > [6.408450] Call Trace:
>>> > [6.410907]  [] smp_store_cpu_info+0x3e/0x40
>>> > [6.416753]  [] cpu_bringup+0x35/0x90
>>> > [6.421981]  [] cpu_bringup_and_idle+0x20/0x40
>>> > [6.427987] Code: 44 89 e7 ff 50 68 0f b7 93 d2 00 00 00 39 d0 75
>>> 1c 0f b7 bb da 00 00 00 44 89 e6 e8 e4 02 01 00 85 c0 75 07 5b 41 5c 5d c3
>>> 0f 0b <0f> 0b 0f b7 8b d4 00 00 00 89 c2 44 89 e6 48 c7 c7 e8 ce ca 81
>>> > [6.448249] RIP  [] identify_secondary_cpu+0x57/0x
>>> 80
>>> > [6.454801]  RSP 
>>> > [6.458305] ---[ end trace 2f9b62c5c7050204 ]---
>>> >
>>> >
>>> > So 

Re: [CentOS-virt] Xen C6 kernel 4.9.13 and testing 4.9.15 only reboots.

2017-04-18 Thread PJ Welsh
Just to note, the same pattern happens on C7:
"CentOS Linux, with Xen hypervisor" = reboot
"CentOS Linux (4.9.20-26.el7.x86_64) 7 (Core)" = boot

[root@XXX ~]# uname -a
Linux XXX 4.9.20-25.el7.x86_64 #1 SMP Fri Mar 31 08:53:28 CDT 2017 x86_64
x86_64 x86_64

On Tue, Apr 18, 2017 at 8:36 AM, PJ Welsh  wrote:

> There was a note that the non-Xen kernel at the same kernel version did
> indeed boot:
> "CentOS-6 4.9.20-26 kernel exhibits the same constant
> kernel-start-then-reboot issue when booting under the "CentOS Linux, with
> Xen hypervisor" grub2 menu option. However, it *does* properly boot under
> the "CentOS Linux (4.9.20-25.el7.x86_64) 7 (Core)" grub2 menu option!"
>
> Trying to get back into being able to test this more.
>
> Thanks
> PJ
>
> On Tue, Apr 18, 2017 at 8:30 AM, Johnny Hughes  wrote:
>
>> On 04/14/2017 03:26 PM, Anderson, Dave wrote:
>> > Sad to say that I already tested 4.9.20-26 from your repo
>> yesterday...it does look a little cleaner before it dies, but still dies. I
>> have not tested it with the vcpu=4 wokaround, but I can tonight if you
>> would like. Relevant bits below:
>> >
>> > Loading Xen 4.6.3-12.el7 ...
>> > Loading Linux 4.9.20-26.el7.x86_64 ...
>> > Loading initial ramdisk ...
>> > [0.00] Linux version 4.9.20-26.el7.x86_64 (mockbuild@) (gcc
>> version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Tue Apr 4 11:19:26
>> CDT 2017
>> >
>> > 
>> >
>> > [6.195089] smpboot: Max logical packages: 1
>> > [6.199549] VPMU disabled by hypervisor.
>> > [6.203663] Performance Events: SandyBridge events, PMU not
>> available due to virtualization, using software events only.
>> > [6.215436] NMI watchdog: disabled (cpu0): hardware events not
>> enabled
>> > [6.222139] NMI watchdog: Shutting down hard lockup detector on all
>> cpus
>> > [6.229165] installing Xen timer for CPU 1
>> > [6.233849] installing Xen timer for CPU 2
>> > [6.238504] installing Xen timer for CPU 3
>> > [6.243139] installing Xen timer for CPU 4
>> > [6.247836] installing Xen timer for CPU 5
>> > [6.252478] installing Xen timer for CPU 6
>> > [6.257155] installing Xen timer for CPU 7
>> > [6.261795] installing Xen timer for CPU 8
>> > [6.266358] smpboot: Package 1 of CPU 8 exceeds BIOS package data 1.
>> > [6.272736] [ cut here ]
>> > [6.277358] kernel BUG at arch/x86/kernel/cpu/common.c:997!
>> > [6.280104] random: fast init done
>> > [6.286333] invalid opcode:  [#1] SMP
>> > [6.290343] Modules linked in:
>> > [6.293430] CPU: 8 PID: 0 Comm: swapper/8 Not tainted
>> 4.9.20-26.el7.x86_64 #1
>> > [6.300568] Hardware name: Supermicro X9DRT/X9DRT, BIOS 3.2a
>> 08/04/2015
>> > [6.307183] task: 880058a68000 task.stack: c900400c
>> > [6.313103] RIP: e030:[]  []
>> identify_secondary_cpu+0x57/0x80
>> > [6.322019] RSP: e02b:c900400c3f08  EFLAGS: 00010086
>> > [6.327333] RAX: ffe4 RBX: 88005d80a020 RCX:
>> 81e5ffc8
>> > [6.334473] RDX: 0001 RSI: 0005 RDI:
>> 0005
>> > [6.341607] RBP: c900400c3f18 R08: 00ce R09:
>> 
>> > [6.348738] R10: 0005 R11: 0006 R12:
>> 0008
>> > [6.355873] R13:  R14:  R15:
>> 
>> > [6.363006] FS:  () GS:88005d80()
>> knlGS:
>> > [6.371090] CS:  e033 DS: 002b ES: 002b CR0: 80050033
>> > [6.376837] CR2:  CR3: 01e07000 CR4:
>> 00042660
>> > [6.383970] Stack:
>> > [6.386004]  0008  c900400c3f28
>> 8104ebce
>> > [6.393483]  c900400c3f40 81029855 
>> c900400c3f50
>> > [6.400963]  810298d0  
>> 
>> > [6.408450] Call Trace:
>> > [6.410907]  [] smp_store_cpu_info+0x3e/0x40
>> > [6.416753]  [] cpu_bringup+0x35/0x90
>> > [6.421981]  [] cpu_bringup_and_idle+0x20/0x40
>> > [6.427987] Code: 44 89 e7 ff 50 68 0f b7 93 d2 00 00 00 39 d0 75 1c
>> 0f b7 bb da 00 00 00 44 89 e6 e8 e4 02 01 00 85 c0 75 07 5b 41 5c 5d c3 0f
>> 0b <0f> 0b 0f b7 8b d4 00 00 00 89 c2 44 89 e6 48 c7 c7 e8 ce ca 81
>> > [6.448249] RIP  [] identify_secondary_cpu+0x57/0x
>> 80
>> > [6.454801]  RSP 
>> > [6.458305] ---[ end trace 2f9b62c5c7050204 ]---
>> >
>> >
>> > So basically, it removes the "[Firmware Bug]: CPU1: APIC id mismatch.
>> Firmware: 0 APIC: 1"  lines, but otherwise dies the same way. I included a
>> few extra lines up from the panic because the "[6.195089] smpboot: Max
>> logical packages: 1" could possibly be relevant, I need to go look at a
>> clean boot to see if that was in there on this machine.
>> >
>> >
>> > Even more strangely, in addition to the machine I'm talking about 

Re: [CentOS-virt] Xen C6 kernel 4.9.13 and testing 4.9.15 only reboots.

2017-04-18 Thread PJ Welsh
There was a note that the non-Xen kernel at the same kernel version did
indeed boot:
"CentOS-6 4.9.20-26 kernel exhibits the same constant
kernel-start-then-reboot issue when booting under the "CentOS Linux, with
Xen hypervisor" grub2 menu option. However, it *does* properly boot under
the "CentOS Linux (4.9.20-25.el7.x86_64) 7 (Core)" grub2 menu option!"

Trying to get back into being able to test this more.

Thanks
PJ

On Tue, Apr 18, 2017 at 8:30 AM, Johnny Hughes  wrote:

> On 04/14/2017 03:26 PM, Anderson, Dave wrote:
> > Sad to say that I already tested 4.9.20-26 from your repo yesterday...it
> does look a little cleaner before it dies, but still dies. I have not
> tested it with the vcpu=4 wokaround, but I can tonight if you would like.
> Relevant bits below:
> >
> > Loading Xen 4.6.3-12.el7 ...
> > Loading Linux 4.9.20-26.el7.x86_64 ...
> > Loading initial ramdisk ...
> > [0.00] Linux version 4.9.20-26.el7.x86_64 (mockbuild@) (gcc
> version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Tue Apr 4 11:19:26
> CDT 2017
> >
> > 
> >
> > [6.195089] smpboot: Max logical packages: 1
> > [6.199549] VPMU disabled by hypervisor.
> > [6.203663] Performance Events: SandyBridge events, PMU not available
> due to virtualization, using software events only.
> > [6.215436] NMI watchdog: disabled (cpu0): hardware events not enabled
> > [6.222139] NMI watchdog: Shutting down hard lockup detector on all
> cpus
> > [6.229165] installing Xen timer for CPU 1
> > [6.233849] installing Xen timer for CPU 2
> > [6.238504] installing Xen timer for CPU 3
> > [6.243139] installing Xen timer for CPU 4
> > [6.247836] installing Xen timer for CPU 5
> > [6.252478] installing Xen timer for CPU 6
> > [6.257155] installing Xen timer for CPU 7
> > [6.261795] installing Xen timer for CPU 8
> > [6.266358] smpboot: Package 1 of CPU 8 exceeds BIOS package data 1.
> > [6.272736] [ cut here ]
> > [6.277358] kernel BUG at arch/x86/kernel/cpu/common.c:997!
> > [6.280104] random: fast init done
> > [6.286333] invalid opcode:  [#1] SMP
> > [6.290343] Modules linked in:
> > [6.293430] CPU: 8 PID: 0 Comm: swapper/8 Not tainted
> 4.9.20-26.el7.x86_64 #1
> > [6.300568] Hardware name: Supermicro X9DRT/X9DRT, BIOS 3.2a
> 08/04/2015
> > [6.307183] task: 880058a68000 task.stack: c900400c
> > [6.313103] RIP: e030:[]  []
> identify_secondary_cpu+0x57/0x80
> > [6.322019] RSP: e02b:c900400c3f08  EFLAGS: 00010086
> > [6.327333] RAX: ffe4 RBX: 88005d80a020 RCX:
> 81e5ffc8
> > [6.334473] RDX: 0001 RSI: 0005 RDI:
> 0005
> > [6.341607] RBP: c900400c3f18 R08: 00ce R09:
> 
> > [6.348738] R10: 0005 R11: 0006 R12:
> 0008
> > [6.355873] R13:  R14:  R15:
> 
> > [6.363006] FS:  () GS:88005d80()
> knlGS:
> > [6.371090] CS:  e033 DS: 002b ES: 002b CR0: 80050033
> > [6.376837] CR2:  CR3: 01e07000 CR4:
> 00042660
> > [6.383970] Stack:
> > [6.386004]  0008  c900400c3f28
> 8104ebce
> > [6.393483]  c900400c3f40 81029855 
> c900400c3f50
> > [6.400963]  810298d0  
> 
> > [6.408450] Call Trace:
> > [6.410907]  [] smp_store_cpu_info+0x3e/0x40
> > [6.416753]  [] cpu_bringup+0x35/0x90
> > [6.421981]  [] cpu_bringup_and_idle+0x20/0x40
> > [6.427987] Code: 44 89 e7 ff 50 68 0f b7 93 d2 00 00 00 39 d0 75 1c
> 0f b7 bb da 00 00 00 44 89 e6 e8 e4 02 01 00 85 c0 75 07 5b 41 5c 5d c3 0f
> 0b <0f> 0b 0f b7 8b d4 00 00 00 89 c2 44 89 e6 48 c7 c7 e8 ce ca 81
> > [6.448249] RIP  [] identify_secondary_cpu+0x57/
> 0x80
> > [6.454801]  RSP 
> > [6.458305] ---[ end trace 2f9b62c5c7050204 ]---
> >
> >
> > So basically, it removes the "[Firmware Bug]: CPU1: APIC id mismatch.
> Firmware: 0 APIC: 1"  lines, but otherwise dies the same way. I included a
> few extra lines up from the panic because the "[6.195089] smpboot: Max
> logical packages: 1" could possibly be relevant, I need to go look at a
> clean boot to see if that was in there on this machine.
> >
> >
> > Even more strangely, in addition to the machine I'm talking about which
> panics and reboots, I had a second nearly identical machine (different
> CPU/ram config, everything else the same) which booted but had some kind of
> hw conflict with 4.9.x that I never had before. It appears to be between
> Intel SCU and an intel PCIe NVMe SSD (luckily I wasn't using SCU, so I
> disabled that). Had that other machine not booted I would have just assumed
> 4.9.X was totally broken and sat on 3.18...so I'm glad that 

Re: [CentOS-virt] Xen C6 kernel 4.9.13 and testing 4.9.15 only reboots.

2017-04-18 Thread Johnny Hughes
On 04/14/2017 03:26 PM, Anderson, Dave wrote:
> Sad to say that I already tested 4.9.20-26 from your repo yesterday...it does 
> look a little cleaner before it dies, but still dies. I have not tested it 
> with the vcpu=4 wokaround, but I can tonight if you would like. Relevant bits 
> below:
> 
> Loading Xen 4.6.3-12.el7 ...
> Loading Linux 4.9.20-26.el7.x86_64 ...
> Loading initial ramdisk ...
> [0.00] Linux version 4.9.20-26.el7.x86_64 (mockbuild@) (gcc version 
> 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Tue Apr 4 11:19:26 CDT 2017
> 
> 
> 
> [6.195089] smpboot: Max logical packages: 1
> [6.199549] VPMU disabled by hypervisor.
> [6.203663] Performance Events: SandyBridge events, PMU not available due 
> to virtualization, using software events only.
> [6.215436] NMI watchdog: disabled (cpu0): hardware events not enabled
> [6.222139] NMI watchdog: Shutting down hard lockup detector on all cpus
> [6.229165] installing Xen timer for CPU 1
> [6.233849] installing Xen timer for CPU 2
> [6.238504] installing Xen timer for CPU 3
> [6.243139] installing Xen timer for CPU 4
> [6.247836] installing Xen timer for CPU 5
> [6.252478] installing Xen timer for CPU 6
> [6.257155] installing Xen timer for CPU 7
> [6.261795] installing Xen timer for CPU 8
> [6.266358] smpboot: Package 1 of CPU 8 exceeds BIOS package data 1.
> [6.272736] [ cut here ]
> [6.277358] kernel BUG at arch/x86/kernel/cpu/common.c:997!
> [6.280104] random: fast init done
> [6.286333] invalid opcode:  [#1] SMP
> [6.290343] Modules linked in:
> [6.293430] CPU: 8 PID: 0 Comm: swapper/8 Not tainted 4.9.20-26.el7.x86_64 
> #1
> [6.300568] Hardware name: Supermicro X9DRT/X9DRT, BIOS 3.2a 08/04/2015
> [6.307183] task: 880058a68000 task.stack: c900400c
> [6.313103] RIP: e030:[]  [] 
> identify_secondary_cpu+0x57/0x80
> [6.322019] RSP: e02b:c900400c3f08  EFLAGS: 00010086
> [6.327333] RAX: ffe4 RBX: 88005d80a020 RCX: 
> 81e5ffc8
> [6.334473] RDX: 0001 RSI: 0005 RDI: 
> 0005
> [6.341607] RBP: c900400c3f18 R08: 00ce R09: 
> 
> [6.348738] R10: 0005 R11: 0006 R12: 
> 0008
> [6.355873] R13:  R14:  R15: 
> 
> [6.363006] FS:  () GS:88005d80() 
> knlGS:
> [6.371090] CS:  e033 DS: 002b ES: 002b CR0: 80050033
> [6.376837] CR2:  CR3: 01e07000 CR4: 
> 00042660
> [6.383970] Stack:
> [6.386004]  0008  c900400c3f28 
> 8104ebce
> [6.393483]  c900400c3f40 81029855  
> c900400c3f50
> [6.400963]  810298d0   
> 
> [6.408450] Call Trace:
> [6.410907]  [] smp_store_cpu_info+0x3e/0x40
> [6.416753]  [] cpu_bringup+0x35/0x90
> [6.421981]  [] cpu_bringup_and_idle+0x20/0x40
> [6.427987] Code: 44 89 e7 ff 50 68 0f b7 93 d2 00 00 00 39 d0 75 1c 0f b7 
> bb da 00 00 00 44 89 e6 e8 e4 02 01 00 85 c0 75 07 5b 41 5c 5d c3 0f 0b <0f> 
> 0b 0f b7 8b d4 00 00 00 89 c2 44 89 e6 48 c7 c7 e8 ce ca 81 
> [6.448249] RIP  [] identify_secondary_cpu+0x57/0x80
> [6.454801]  RSP 
> [6.458305] ---[ end trace 2f9b62c5c7050204 ]---
> 
> 
> So basically, it removes the "[Firmware Bug]: CPU1: APIC id mismatch. 
> Firmware: 0 APIC: 1"  lines, but otherwise dies the same way. I included a 
> few extra lines up from the panic because the "[6.195089] smpboot: Max 
> logical packages: 1" could possibly be relevant, I need to go look at a clean 
> boot to see if that was in there on this machine.
> 
> 
> Even more strangely, in addition to the machine I'm talking about which 
> panics and reboots, I had a second nearly identical machine (different 
> CPU/ram config, everything else the same) which booted but had some kind of 
> hw conflict with 4.9.x that I never had before. It appears to be between 
> Intel SCU and an intel PCIe NVMe SSD (luckily I wasn't using SCU, so I 
> disabled that). Had that other machine not booted I would have just assumed 
> 4.9.X was totally broken and sat on 3.18...so I'm glad that one machine 
> booted at least :)
> 
> Thanks,
> -Dave

Dave,

Just for testing purposes, can you try booting the kernel in the normal
way on the machine does does not work (a normal grub entry on the kernel
with no xen.gz line)

That way, we can hopefully narrow the issue down to a hypervisor issue
or a kernel config issue.

Thanks,
Johnny Hughes

> 
> 
>> On Apr 14, 2017, at 05:39, Johnny Hughes  wrote:
>>
>> Dave,
>>
>> Take a look at this kernel as it is the one I think we are going to
>> release (or a slightly newer 4.9.2x from kernel.org LTS). This version
>> 

Re: [CentOS-virt] Xen C6 kernel 4.9.13 and testing 4.9.15 only reboots.

2017-04-14 Thread Anderson, Dave
I also just realized the C6 portion of the title/subject line here refers to 
CentOS 6, so I'd like to clarify that all my testing/issues/etc was under 
CentOS 7.3 with all patches applied.


Thanks,
-Dave



> On Apr 14, 2017, at 1:39 PM, Anderson, Dave  wrote:
> 
> So, strangely,
> 
> I have two _identical_ dualproc xeon mobos (same bios/ipmi versions, they 
> even share an enclosure, one is right side, other is left), each with 
> different cpu/memory:
> 
> 
> Using 4.9.13 with vcpu limited to 4, early in the boot process, the one that 
> _was_ booting before setting the xen vcpu args says:
> "[7.060720] smpboot: Max logical packages: 2", 
> 
> and the other one says 
> "[6.195089] smpboot: Max logical packages: 1"
> 
> 
> 
> They both have dual procs, known working/good. 
> 
> 
> The first (the one that worked unmodified) has dual 8 core (16 HT/ea) and 
> correctly detects "[0.00] smpboot: Allowing 32 CPUs, 0 hotplug CPUs". 
> It's a Xeon E5-2665v1.
> 
> The second machine (didn't work without the xen vcpu args) has dual 4 core 
> (8ht/ea) and also correctly detects "[0.00] smpboot: Allowing 16 
> CPUs, 0 hotplug CPUs". It's a Xeon E5-2643v1...so it seems like this one does 
> ok until it decides there's only one cpu package?
> 
> Thanks,
> -Dave
> 
> 
>> On Apr 14, 2017, at 13:26, Anderson, Dave  wrote:
>> 
>> Sad to say that I already tested 4.9.20-26 from your repo yesterday...it 
>> does look a little cleaner before it dies, but still dies. I have not tested 
>> it with the vcpu=4 wokaround, but I can tonight if you would like. Relevant 
>> bits below:
>> 
>> Loading Xen 4.6.3-12.el7 ...
>> Loading Linux 4.9.20-26.el7.x86_64 ...
>> Loading initial ramdisk ...
>> [0.00] Linux version 4.9.20-26.el7.x86_64 (mockbuild@) (gcc version 
>> 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Tue Apr 4 11:19:26 CDT 2017
>> 
>> 
>> 
>> [6.195089] smpboot: Max logical packages: 1
>> [6.199549] VPMU disabled by hypervisor.
>> [6.203663] Performance Events: SandyBridge events, PMU not available due 
>> to virtualization, using software events only.
>> [6.215436] NMI watchdog: disabled (cpu0): hardware events not enabled
>> [6.222139] NMI watchdog: Shutting down hard lockup detector on all cpus
>> [6.229165] installing Xen timer for CPU 1
>> [6.233849] installing Xen timer for CPU 2
>> [6.238504] installing Xen timer for CPU 3
>> [6.243139] installing Xen timer for CPU 4
>> [6.247836] installing Xen timer for CPU 5
>> [6.252478] installing Xen timer for CPU 6
>> [6.257155] installing Xen timer for CPU 7
>> [6.261795] installing Xen timer for CPU 8
>> [6.266358] smpboot: Package 1 of CPU 8 exceeds BIOS package data 1.
>> [6.272736] [ cut here ]
>> [6.277358] kernel BUG at arch/x86/kernel/cpu/common.c:997!
>> [6.280104] random: fast init done
>> [6.286333] invalid opcode:  [#1] SMP
>> [6.290343] Modules linked in:
>> [6.293430] CPU: 8 PID: 0 Comm: swapper/8 Not tainted 
>> 4.9.20-26.el7.x86_64 #1
>> [6.300568] Hardware name: Supermicro X9DRT/X9DRT, BIOS 3.2a 08/04/2015
>> [6.307183] task: 880058a68000 task.stack: c900400c
>> [6.313103] RIP: e030:[]  [] 
>> identify_secondary_cpu+0x57/0x80
>> [6.322019] RSP: e02b:c900400c3f08  EFLAGS: 00010086
>> [6.327333] RAX: ffe4 RBX: 88005d80a020 RCX: 
>> 81e5ffc8
>> [6.334473] RDX: 0001 RSI: 0005 RDI: 
>> 0005
>> [6.341607] RBP: c900400c3f18 R08: 00ce R09: 
>> 
>> [6.348738] R10: 0005 R11: 0006 R12: 
>> 0008
>> [6.355873] R13:  R14:  R15: 
>> 
>> [6.363006] FS:  () GS:88005d80() 
>> knlGS:
>> [6.371090] CS:  e033 DS: 002b ES: 002b CR0: 80050033
>> [6.376837] CR2:  CR3: 01e07000 CR4: 
>> 00042660
>> [6.383970] Stack:
>> [6.386004]  0008  c900400c3f28 
>> 8104ebce
>> [6.393483]  c900400c3f40 81029855  
>> c900400c3f50
>> [6.400963]  810298d0   
>> 
>> [6.408450] Call Trace:
>> [6.410907]  [] smp_store_cpu_info+0x3e/0x40
>> [6.416753]  [] cpu_bringup+0x35/0x90
>> [6.421981]  [] cpu_bringup_and_idle+0x20/0x40
>> [6.427987] Code: 44 89 e7 ff 50 68 0f b7 93 d2 00 00 00 39 d0 75 1c 0f 
>> b7 bb da 00 00 00 44 89 e6 e8 e4 02 01 00 85 c0 75 07 5b 41 5c 5d c3 0f 0b 
>> <0f> 0b 0f b7 8b d4 00 00 00 89 c2 44 89 e6 48 c7 c7 e8 ce ca 81 
>> [6.448249] RIP  [] identify_secondary_cpu+0x57/0x80
>> [6.454801]  RSP 
>> [6.458305] ---[ end trace 2f9b62c5c7050204 ]---
>> 
>> 
>> So basically, it removes the "[Firmware Bug]: 

Re: [CentOS-virt] Xen C6 kernel 4.9.13 and testing 4.9.15 only reboots.

2017-04-14 Thread Anderson, Dave
So, strangely,

I have two _identical_ dualproc xeon mobos (same bios/ipmi versions, they even 
share an enclosure, one is right side, other is left), each with different 
cpu/memory:


Using 4.9.13 with vcpu limited to 4, early in the boot process, the one that 
_was_ booting before setting the xen vcpu args says:
"[7.060720] smpboot: Max logical packages: 2", 

and the other one says 
"[6.195089] smpboot: Max logical packages: 1"



They both have dual procs, known working/good. 


The first (the one that worked unmodified) has dual 8 core (16 HT/ea) and 
correctly detects "[0.00] smpboot: Allowing 32 CPUs, 0 hotplug CPUs". 
It's a Xeon E5-2665v1.

The second machine (didn't work without the xen vcpu args) has dual 4 core 
(8ht/ea) and also correctly detects "[0.00] smpboot: Allowing 16 CPUs, 
0 hotplug CPUs". It's a Xeon E5-2643v1...so it seems like this one does ok 
until it decides there's only one cpu package?

Thanks,
-Dave


> On Apr 14, 2017, at 13:26, Anderson, Dave  wrote:
> 
> Sad to say that I already tested 4.9.20-26 from your repo yesterday...it does 
> look a little cleaner before it dies, but still dies. I have not tested it 
> with the vcpu=4 wokaround, but I can tonight if you would like. Relevant bits 
> below:
> 
> Loading Xen 4.6.3-12.el7 ...
> Loading Linux 4.9.20-26.el7.x86_64 ...
> Loading initial ramdisk ...
> [0.00] Linux version 4.9.20-26.el7.x86_64 (mockbuild@) (gcc version 
> 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Tue Apr 4 11:19:26 CDT 2017
> 
> 
> 
> [6.195089] smpboot: Max logical packages: 1
> [6.199549] VPMU disabled by hypervisor.
> [6.203663] Performance Events: SandyBridge events, PMU not available due 
> to virtualization, using software events only.
> [6.215436] NMI watchdog: disabled (cpu0): hardware events not enabled
> [6.222139] NMI watchdog: Shutting down hard lockup detector on all cpus
> [6.229165] installing Xen timer for CPU 1
> [6.233849] installing Xen timer for CPU 2
> [6.238504] installing Xen timer for CPU 3
> [6.243139] installing Xen timer for CPU 4
> [6.247836] installing Xen timer for CPU 5
> [6.252478] installing Xen timer for CPU 6
> [6.257155] installing Xen timer for CPU 7
> [6.261795] installing Xen timer for CPU 8
> [6.266358] smpboot: Package 1 of CPU 8 exceeds BIOS package data 1.
> [6.272736] [ cut here ]
> [6.277358] kernel BUG at arch/x86/kernel/cpu/common.c:997!
> [6.280104] random: fast init done
> [6.286333] invalid opcode:  [#1] SMP
> [6.290343] Modules linked in:
> [6.293430] CPU: 8 PID: 0 Comm: swapper/8 Not tainted 4.9.20-26.el7.x86_64 
> #1
> [6.300568] Hardware name: Supermicro X9DRT/X9DRT, BIOS 3.2a 08/04/2015
> [6.307183] task: 880058a68000 task.stack: c900400c
> [6.313103] RIP: e030:[]  [] 
> identify_secondary_cpu+0x57/0x80
> [6.322019] RSP: e02b:c900400c3f08  EFLAGS: 00010086
> [6.327333] RAX: ffe4 RBX: 88005d80a020 RCX: 
> 81e5ffc8
> [6.334473] RDX: 0001 RSI: 0005 RDI: 
> 0005
> [6.341607] RBP: c900400c3f18 R08: 00ce R09: 
> 
> [6.348738] R10: 0005 R11: 0006 R12: 
> 0008
> [6.355873] R13:  R14:  R15: 
> 
> [6.363006] FS:  () GS:88005d80() 
> knlGS:
> [6.371090] CS:  e033 DS: 002b ES: 002b CR0: 80050033
> [6.376837] CR2:  CR3: 01e07000 CR4: 
> 00042660
> [6.383970] Stack:
> [6.386004]  0008  c900400c3f28 
> 8104ebce
> [6.393483]  c900400c3f40 81029855  
> c900400c3f50
> [6.400963]  810298d0   
> 
> [6.408450] Call Trace:
> [6.410907]  [] smp_store_cpu_info+0x3e/0x40
> [6.416753]  [] cpu_bringup+0x35/0x90
> [6.421981]  [] cpu_bringup_and_idle+0x20/0x40
> [6.427987] Code: 44 89 e7 ff 50 68 0f b7 93 d2 00 00 00 39 d0 75 1c 0f b7 
> bb da 00 00 00 44 89 e6 e8 e4 02 01 00 85 c0 75 07 5b 41 5c 5d c3 0f 0b <0f> 
> 0b 0f b7 8b d4 00 00 00 89 c2 44 89 e6 48 c7 c7 e8 ce ca 81 
> [6.448249] RIP  [] identify_secondary_cpu+0x57/0x80
> [6.454801]  RSP 
> [6.458305] ---[ end trace 2f9b62c5c7050204 ]---
> 
> 
> So basically, it removes the "[Firmware Bug]: CPU1: APIC id mismatch. 
> Firmware: 0 APIC: 1"  lines, but otherwise dies the same way. I included a 
> few extra lines up from the panic because the "[6.195089] smpboot: Max 
> logical packages: 1" could possibly be relevant, I need to go look at a clean 
> boot to see if that was in there on this machine.
> 
> 
> Even more strangely, in addition to the machine I'm talking about which 
> panics and reboots, I had a 

Re: [CentOS-virt] Xen C6 kernel 4.9.13 and testing 4.9.15 only reboots.

2017-04-14 Thread Anderson, Dave
Sad to say that I already tested 4.9.20-26 from your repo yesterday...it does 
look a little cleaner before it dies, but still dies. I have not tested it with 
the vcpu=4 wokaround, but I can tonight if you would like. Relevant bits below:

Loading Xen 4.6.3-12.el7 ...
Loading Linux 4.9.20-26.el7.x86_64 ...
Loading initial ramdisk ...
[0.00] Linux version 4.9.20-26.el7.x86_64 (mockbuild@) (gcc version 
4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Tue Apr 4 11:19:26 CDT 2017



[6.195089] smpboot: Max logical packages: 1
[6.199549] VPMU disabled by hypervisor.
[6.203663] Performance Events: SandyBridge events, PMU not available due to 
virtualization, using software events only.
[6.215436] NMI watchdog: disabled (cpu0): hardware events not enabled
[6.222139] NMI watchdog: Shutting down hard lockup detector on all cpus
[6.229165] installing Xen timer for CPU 1
[6.233849] installing Xen timer for CPU 2
[6.238504] installing Xen timer for CPU 3
[6.243139] installing Xen timer for CPU 4
[6.247836] installing Xen timer for CPU 5
[6.252478] installing Xen timer for CPU 6
[6.257155] installing Xen timer for CPU 7
[6.261795] installing Xen timer for CPU 8
[6.266358] smpboot: Package 1 of CPU 8 exceeds BIOS package data 1.
[6.272736] [ cut here ]
[6.277358] kernel BUG at arch/x86/kernel/cpu/common.c:997!
[6.280104] random: fast init done
[6.286333] invalid opcode:  [#1] SMP
[6.290343] Modules linked in:
[6.293430] CPU: 8 PID: 0 Comm: swapper/8 Not tainted 4.9.20-26.el7.x86_64 #1
[6.300568] Hardware name: Supermicro X9DRT/X9DRT, BIOS 3.2a 08/04/2015
[6.307183] task: 880058a68000 task.stack: c900400c
[6.313103] RIP: e030:[]  [] 
identify_secondary_cpu+0x57/0x80
[6.322019] RSP: e02b:c900400c3f08  EFLAGS: 00010086
[6.327333] RAX: ffe4 RBX: 88005d80a020 RCX: 81e5ffc8
[6.334473] RDX: 0001 RSI: 0005 RDI: 0005
[6.341607] RBP: c900400c3f18 R08: 00ce R09: 
[6.348738] R10: 0005 R11: 0006 R12: 0008
[6.355873] R13:  R14:  R15: 
[6.363006] FS:  () GS:88005d80() 
knlGS:
[6.371090] CS:  e033 DS: 002b ES: 002b CR0: 80050033
[6.376837] CR2:  CR3: 01e07000 CR4: 00042660
[6.383970] Stack:
[6.386004]  0008  c900400c3f28 
8104ebce
[6.393483]  c900400c3f40 81029855  
c900400c3f50
[6.400963]  810298d0   

[6.408450] Call Trace:
[6.410907]  [] smp_store_cpu_info+0x3e/0x40
[6.416753]  [] cpu_bringup+0x35/0x90
[6.421981]  [] cpu_bringup_and_idle+0x20/0x40
[6.427987] Code: 44 89 e7 ff 50 68 0f b7 93 d2 00 00 00 39 d0 75 1c 0f b7 
bb da 00 00 00 44 89 e6 e8 e4 02 01 00 85 c0 75 07 5b 41 5c 5d c3 0f 0b <0f> 0b 
0f b7 8b d4 00 00 00 89 c2 44 89 e6 48 c7 c7 e8 ce ca 81 
[6.448249] RIP  [] identify_secondary_cpu+0x57/0x80
[6.454801]  RSP 
[6.458305] ---[ end trace 2f9b62c5c7050204 ]---


So basically, it removes the "[Firmware Bug]: CPU1: APIC id mismatch. Firmware: 
0 APIC: 1"  lines, but otherwise dies the same way. I included a few extra 
lines up from the panic because the "[6.195089] smpboot: Max logical 
packages: 1" could possibly be relevant, I need to go look at a clean boot to 
see if that was in there on this machine.


Even more strangely, in addition to the machine I'm talking about which panics 
and reboots, I had a second nearly identical machine (different CPU/ram config, 
everything else the same) which booted but had some kind of hw conflict with 
4.9.x that I never had before. It appears to be between Intel SCU and an intel 
PCIe NVMe SSD (luckily I wasn't using SCU, so I disabled that). Had that other 
machine not booted I would have just assumed 4.9.X was totally broken and sat 
on 3.18...so I'm glad that one machine booted at least :)

Thanks,
-Dave


> On Apr 14, 2017, at 05:39, Johnny Hughes  wrote:
> 
> Dave,
> 
> Take a look at this kernel as it is the one I think we are going to
> release (or a slightly newer 4.9.2x from kernel.org LTS). This version
> has some newer settings that are more redhat/fedora/centos base kernel
> like WRT what is a module and what is built into the kernel, etc.
> 
> https://people.centos.org/hughesjr/4.9.x/
> 
> Thanks,
> Johnny Hughes
> 
> On 04/14/2017 05:16 AM, Anderson, Dave wrote:
>> List moderator: feel free to delete my previous large message with 
>> attachments that's in the moderation queue...it's now obsolete anyway.
>> 
>> 
>> I have found a fix/workaround for my reboot issues with Xen 4.6.3-12 + 
>> Kernel 4.9.13:
>> 
>> Once I finally got serial 

Re: [CentOS-virt] Xen C6 kernel 4.9.13 and testing 4.9.15 only reboots.

2017-04-14 Thread PJ Welsh
Very nice on the sleuthing!
Thanks

On Fri, Apr 14, 2017 at 5:16 AM, Anderson, Dave 
wrote:

> List moderator: feel free to delete my previous large message with
> attachments that's in the moderation queue...it's now obsolete anyway.
>
>
> I have found a fix/workaround for my reboot issues with Xen 4.6.3-12 +
> Kernel 4.9.13:
>
> Once I finally got serial output all the way through the boot process
> (xen+dom0) I discovered the stack trace:
>
> [Firmware Bug]: CPU7: APIC id mismatch. Firmware: 0 APIC: 7
> installing Xen timer for CPU 8
> [Firmware Bug]: CPU8: APIC id mismatch. Firmware: 0 APIC: 20
> smpboot: Package 1 of CPU 8 exceeds BIOS package data 1.
> [ cut here ]
> kernel BUG at arch/x86/kernel/cpu/common.c:997!
> invalid opcode:  [#1] SMP
> Modules linked in:
> CPU: 8 PID: 0 Comm: swapper/8 Not tainted 4.9.13-22.el7.x86_64 #1
> Hardware name: Supermicro X9DRT/X9DRT, BIOS 3.2a 08/04/2015
> random: fast init done
> task: 880058a8c4c0 task.stack: c900400b4000
> RIP: e030:[]  []
> identify_secondary_cpu+0x57/0x80
> RSP: e02b:c900400b7f08  EFLAGS: 00010086
> RAX: ffe4 RBX: 88005d80a020 RCX: 81c5be68
> RDX: 0001 RSI: 0005 RDI: 0005
> RBP: c900400b7f18 R08: 00cb R09: 0004
> R10:  R11: 0006 R12: 0008
> R13:  R14:  R15: 
> FS:  () GS:88005d80()
> knlGS:
> CS:  e033 DS: 002b ES: 002b CR0: 80050033
> CR2:  CR3: 01c07000 CR4: 00042660
> Stack:
>  0008  c900400b7f28 8104e94e
>  c900400b7f40 81029925  c900400b7f50
>  810299a0   
> Call Trace:
>  [] smp_store_cpu_info+0x3e/0x40
>  [] cpu_bringup+0x35/0x90
>  [] cpu_bringup_and_idle+0x20/0x40
> Code: 44 89 e7 ff 50 68 0f b7 93 d2 00 00 00 39 d0 75 1c 0f b7 bb da 00 00
> 00 44 89 e6 e8 24 03 01 00 85 c0 75 07 5b 41 5c 5d c3 0f 0b <0f> 0b 0f b7
> 8b d4 00 00 00 89 c2 44 89 e6 48 c7 c7 98 87 a6 81
> RIP  [] identify_secondary_cpu+0x57/0x80
>  RSP 
> ---[ end trace dc5563100443876e ]---
>
> I surmised that reducing the number of dom0 vcpu might solve this issue
> (they were unbounded)
>
> In testing adding "dom0_max_vcpus=4 dom0_vcpus_pin" to the
> GRUB_CMDLINE_XEN_DEFAULT line in /etc/defaults/grub and re-running
> grub2-mkconfig has resulted in the system I have that never booted Xen
> 4.6.3-12 + Kernel 4.9.13, booting every single time out of 5-10 tests.
>
>
> So...I don't know if there's a race condition somewhere, or
> what...but...so far this workaround has not failed me.
>
> Thanks,
> -Dave
>
>
>
> > On Fri, Apr 7, 2017 at 6:58 AM, PJ Welsh  >> wrote:
> >> I've not gotten any bites from my posting on the xen-devel mailing list.
> >> Here is the only one to-date:
> >> https://lists.xen.org/archives/html/xen-devel/2017-04/msg01069.html
> >>
> >> From that email, there needs to be some hypervisor messages.
> >>
> >> Does anyone know how to produce the hypervisor messages? I've already
> >
> >> removed the rhgb and quiet options from the boot.
> >
> >>
> >> Thanks
> >> PJ
> >
> >
> > I spoke too soon. To get more information: Please see
> >
> > https://wiki.xenproject.org/wiki/Reporting_Bugs_against_Xen_Project
> >
> > and
> >
> > https://wiki.xenproject.org/wiki/Xen_Serial_Console
> >
> > or alternatively at least add "vga=keep".
> >
> > pjwelsh
>
>
> ___
> CentOS-virt mailing list
> CentOS-virt@centos.org
> https://lists.centos.org/mailman/listinfo/centos-virt
>
___
CentOS-virt mailing list
CentOS-virt@centos.org
https://lists.centos.org/mailman/listinfo/centos-virt


Re: [CentOS-virt] Xen C6 kernel 4.9.13 and testing 4.9.15 only reboots.

2017-04-14 Thread PJ Welsh
I am on holiday until Sunday, but will download the kernel now and test it
when I get back into work.
Thanks

On Fri, Apr 14, 2017 at 7:39 AM, Johnny Hughes  wrote:

> Dave,
>
> Take a look at this kernel as it is the one I think we are going to
> release (or a slightly newer 4.9.2x from kernel.org LTS). This version
> has some newer settings that are more redhat/fedora/centos base kernel
> like WRT what is a module and what is built into the kernel, etc.
>
> https://people.centos.org/hughesjr/4.9.x/
>
> Thanks,
> Johnny Hughes
>
> On 04/14/2017 05:16 AM, Anderson, Dave wrote:
> > List moderator: feel free to delete my previous large message with
> attachments that's in the moderation queue...it's now obsolete anyway.
> >
> >
> > I have found a fix/workaround for my reboot issues with Xen 4.6.3-12 +
> Kernel 4.9.13:
> >
> > Once I finally got serial output all the way through the boot process
> (xen+dom0) I discovered the stack trace:
> >
> > [Firmware Bug]: CPU7: APIC id mismatch. Firmware: 0 APIC: 7
> > installing Xen timer for CPU 8
> > [Firmware Bug]: CPU8: APIC id mismatch. Firmware: 0 APIC: 20
> > smpboot: Package 1 of CPU 8 exceeds BIOS package data 1.
> > [ cut here ]
> > kernel BUG at arch/x86/kernel/cpu/common.c:997!
> > invalid opcode:  [#1] SMP
> > Modules linked in:
> > CPU: 8 PID: 0 Comm: swapper/8 Not tainted 4.9.13-22.el7.x86_64 #1
> > Hardware name: Supermicro X9DRT/X9DRT, BIOS 3.2a 08/04/2015
> > random: fast init done
> > task: 880058a8c4c0 task.stack: c900400b4000
> > RIP: e030:[]  []
> identify_secondary_cpu+0x57/0x80
> > RSP: e02b:c900400b7f08  EFLAGS: 00010086
> > RAX: ffe4 RBX: 88005d80a020 RCX: 81c5be68
> > RDX: 0001 RSI: 0005 RDI: 0005
> > RBP: c900400b7f18 R08: 00cb R09: 0004
> > R10:  R11: 0006 R12: 0008
> > R13:  R14:  R15: 
> > FS:  () GS:88005d80()
> knlGS:
> > CS:  e033 DS: 002b ES: 002b CR0: 80050033
> > CR2:  CR3: 01c07000 CR4: 00042660
> > Stack:
> >  0008  c900400b7f28 8104e94e
> >  c900400b7f40 81029925  c900400b7f50
> >  810299a0   
> > Call Trace:
> >  [] smp_store_cpu_info+0x3e/0x40
> >  [] cpu_bringup+0x35/0x90
> >  [] cpu_bringup_and_idle+0x20/0x40
> > Code: 44 89 e7 ff 50 68 0f b7 93 d2 00 00 00 39 d0 75 1c 0f b7 bb da 00
> 00 00 44 89 e6 e8 24 03 01 00 85 c0 75 07 5b 41 5c 5d c3 0f 0b <0f> 0b 0f
> b7 8b d4 00 00 00 89 c2 44 89 e6 48 c7 c7 98 87 a6 81
> > RIP  [] identify_secondary_cpu+0x57/0x80
> >  RSP 
> > ---[ end trace dc5563100443876e ]---
> >
> > I surmised that reducing the number of dom0 vcpu might solve this issue
> (they were unbounded)
> >
> > In testing adding "dom0_max_vcpus=4 dom0_vcpus_pin" to the
> GRUB_CMDLINE_XEN_DEFAULT line in /etc/defaults/grub and re-running
> grub2-mkconfig has resulted in the system I have that never booted Xen
> 4.6.3-12 + Kernel 4.9.13, booting every single time out of 5-10 tests.
> >
> >
> > So...I don't know if there's a race condition somewhere, or
> what...but...so far this workaround has not failed me.
> >
> > Thanks,
> > -Dave
> >
> >
> >
> >> On Fri, Apr 7, 2017 at 6:58 AM, PJ Welsh  >>> wrote:
> >>> I've not gotten any bites from my posting on the xen-devel mailing
> list.
> >>> Here is the only one to-date:
> >>> https://lists.xen.org/archives/html/xen-devel/2017-04/msg01069.html
> >>>
> >>> From that email, there needs to be some hypervisor messages.
> >>>
> >>> Does anyone know how to produce the hypervisor messages? I've already
> >>
> >>> removed the rhgb and quiet options from the boot.
> >>
> >>>
> >>> Thanks
> >>> PJ
> >>
> >>
> >> I spoke too soon. To get more information: Please see
> >>
> >> https://wiki.xenproject.org/wiki/Reporting_Bugs_against_Xen_Project
> >>
> >> and
> >>
> >> https://wiki.xenproject.org/wiki/Xen_Serial_Console
> >>
> >> or alternatively at least add "vga=keep".
> >>
> >> pjwelsh
> >
> >
> > ___
> > CentOS-virt mailing list
> > CentOS-virt@centos.org
> > https://lists.centos.org/mailman/listinfo/centos-virt
> >
>
>
>
> ___
> CentOS-virt mailing list
> CentOS-virt@centos.org
> https://lists.centos.org/mailman/listinfo/centos-virt
>
>
___
CentOS-virt mailing list
CentOS-virt@centos.org
https://lists.centos.org/mailman/listinfo/centos-virt


Re: [CentOS-virt] Xen C6 kernel 4.9.13 and testing 4.9.15 only reboots.

2017-04-14 Thread Johnny Hughes
Dave,

Take a look at this kernel as it is the one I think we are going to
release (or a slightly newer 4.9.2x from kernel.org LTS). This version
has some newer settings that are more redhat/fedora/centos base kernel
like WRT what is a module and what is built into the kernel, etc.

https://people.centos.org/hughesjr/4.9.x/

Thanks,
Johnny Hughes

On 04/14/2017 05:16 AM, Anderson, Dave wrote:
> List moderator: feel free to delete my previous large message with 
> attachments that's in the moderation queue...it's now obsolete anyway.
> 
> 
> I have found a fix/workaround for my reboot issues with Xen 4.6.3-12 + Kernel 
> 4.9.13:
> 
> Once I finally got serial output all the way through the boot process 
> (xen+dom0) I discovered the stack trace:
> 
> [Firmware Bug]: CPU7: APIC id mismatch. Firmware: 0 APIC: 7
> installing Xen timer for CPU 8
> [Firmware Bug]: CPU8: APIC id mismatch. Firmware: 0 APIC: 20
> smpboot: Package 1 of CPU 8 exceeds BIOS package data 1.
> [ cut here ]
> kernel BUG at arch/x86/kernel/cpu/common.c:997!
> invalid opcode:  [#1] SMP
> Modules linked in:
> CPU: 8 PID: 0 Comm: swapper/8 Not tainted 4.9.13-22.el7.x86_64 #1
> Hardware name: Supermicro X9DRT/X9DRT, BIOS 3.2a 08/04/2015
> random: fast init done
> task: 880058a8c4c0 task.stack: c900400b4000
> RIP: e030:[]  [] 
> identify_secondary_cpu+0x57/0x80
> RSP: e02b:c900400b7f08  EFLAGS: 00010086
> RAX: ffe4 RBX: 88005d80a020 RCX: 81c5be68
> RDX: 0001 RSI: 0005 RDI: 0005
> RBP: c900400b7f18 R08: 00cb R09: 0004
> R10:  R11: 0006 R12: 0008
> R13:  R14:  R15: 
> FS:  () GS:88005d80() knlGS:
> CS:  e033 DS: 002b ES: 002b CR0: 80050033
> CR2:  CR3: 01c07000 CR4: 00042660
> Stack:
>  0008  c900400b7f28 8104e94e
>  c900400b7f40 81029925  c900400b7f50
>  810299a0   
> Call Trace:
>  [] smp_store_cpu_info+0x3e/0x40
>  [] cpu_bringup+0x35/0x90
>  [] cpu_bringup_and_idle+0x20/0x40
> Code: 44 89 e7 ff 50 68 0f b7 93 d2 00 00 00 39 d0 75 1c 0f b7 bb da 00 00 00 
> 44 89 e6 e8 24 03 01 00 85 c0 75 07 5b 41 5c 5d c3 0f 0b <0f> 0b 0f b7 8b d4 
> 00 00 00 89 c2 44 89 e6 48 c7 c7 98 87 a6 81 
> RIP  [] identify_secondary_cpu+0x57/0x80
>  RSP 
> ---[ end trace dc5563100443876e ]---
> 
> I surmised that reducing the number of dom0 vcpu might solve this issue (they 
> were unbounded)
> 
> In testing adding "dom0_max_vcpus=4 dom0_vcpus_pin" to the 
> GRUB_CMDLINE_XEN_DEFAULT line in /etc/defaults/grub and re-running 
> grub2-mkconfig has resulted in the system I have that never booted Xen 
> 4.6.3-12 + Kernel 4.9.13, booting every single time out of 5-10 tests.
> 
> 
> So...I don't know if there's a race condition somewhere, or what...but...so 
> far this workaround has not failed me.
> 
> Thanks,
> -Dave
> 
> 
> 
>> On Fri, Apr 7, 2017 at 6:58 AM, PJ Welsh >> wrote:
>>> I've not gotten any bites from my posting on the xen-devel mailing list.
>>> Here is the only one to-date:
>>> https://lists.xen.org/archives/html/xen-devel/2017-04/msg01069.html
>>>
>>> From that email, there needs to be some hypervisor messages.
>>>
>>> Does anyone know how to produce the hypervisor messages? I've already
>>
>>> removed the rhgb and quiet options from the boot.
>>
>>>
>>> Thanks
>>> PJ
>>
>>
>> I spoke too soon. To get more information: Please see
>>
>> https://wiki.xenproject.org/wiki/Reporting_Bugs_against_Xen_Project
>>
>> and
>>
>> https://wiki.xenproject.org/wiki/Xen_Serial_Console
>>
>> or alternatively at least add "vga=keep".
>>
>> pjwelsh
> 
> 
> ___
> CentOS-virt mailing list
> CentOS-virt@centos.org
> https://lists.centos.org/mailman/listinfo/centos-virt
> 




signature.asc
Description: OpenPGP digital signature
___
CentOS-virt mailing list
CentOS-virt@centos.org
https://lists.centos.org/mailman/listinfo/centos-virt


Re: [CentOS-virt] Xen C6 kernel 4.9.13 and testing 4.9.15 only reboots.

2017-04-14 Thread Anderson, Dave
List moderator: feel free to delete my previous large message with attachments 
that's in the moderation queue...it's now obsolete anyway.


I have found a fix/workaround for my reboot issues with Xen 4.6.3-12 + Kernel 
4.9.13:

Once I finally got serial output all the way through the boot process 
(xen+dom0) I discovered the stack trace:

[Firmware Bug]: CPU7: APIC id mismatch. Firmware: 0 APIC: 7
installing Xen timer for CPU 8
[Firmware Bug]: CPU8: APIC id mismatch. Firmware: 0 APIC: 20
smpboot: Package 1 of CPU 8 exceeds BIOS package data 1.
[ cut here ]
kernel BUG at arch/x86/kernel/cpu/common.c:997!
invalid opcode:  [#1] SMP
Modules linked in:
CPU: 8 PID: 0 Comm: swapper/8 Not tainted 4.9.13-22.el7.x86_64 #1
Hardware name: Supermicro X9DRT/X9DRT, BIOS 3.2a 08/04/2015
random: fast init done
task: 880058a8c4c0 task.stack: c900400b4000
RIP: e030:[]  [] 
identify_secondary_cpu+0x57/0x80
RSP: e02b:c900400b7f08  EFLAGS: 00010086
RAX: ffe4 RBX: 88005d80a020 RCX: 81c5be68
RDX: 0001 RSI: 0005 RDI: 0005
RBP: c900400b7f18 R08: 00cb R09: 0004
R10:  R11: 0006 R12: 0008
R13:  R14:  R15: 
FS:  () GS:88005d80() knlGS:
CS:  e033 DS: 002b ES: 002b CR0: 80050033
CR2:  CR3: 01c07000 CR4: 00042660
Stack:
 0008  c900400b7f28 8104e94e
 c900400b7f40 81029925  c900400b7f50
 810299a0   
Call Trace:
 [] smp_store_cpu_info+0x3e/0x40
 [] cpu_bringup+0x35/0x90
 [] cpu_bringup_and_idle+0x20/0x40
Code: 44 89 e7 ff 50 68 0f b7 93 d2 00 00 00 39 d0 75 1c 0f b7 bb da 00 00 00 
44 89 e6 e8 24 03 01 00 85 c0 75 07 5b 41 5c 5d c3 0f 0b <0f> 0b 0f b7 8b d4 00 
00 00 89 c2 44 89 e6 48 c7 c7 98 87 a6 81 
RIP  [] identify_secondary_cpu+0x57/0x80
 RSP 
---[ end trace dc5563100443876e ]---

I surmised that reducing the number of dom0 vcpu might solve this issue (they 
were unbounded)

In testing adding "dom0_max_vcpus=4 dom0_vcpus_pin" to the 
GRUB_CMDLINE_XEN_DEFAULT line in /etc/defaults/grub and re-running 
grub2-mkconfig has resulted in the system I have that never booted Xen 4.6.3-12 
+ Kernel 4.9.13, booting every single time out of 5-10 tests.


So...I don't know if there's a race condition somewhere, or what...but...so far 
this workaround has not failed me.

Thanks,
-Dave



> On Fri, Apr 7, 2017 at 6:58 AM, PJ Welsh > wrote:
>> I've not gotten any bites from my posting on the xen-devel mailing list.
>> Here is the only one to-date:
>> https://lists.xen.org/archives/html/xen-devel/2017-04/msg01069.html
>> 
>> From that email, there needs to be some hypervisor messages.
>> 
>> Does anyone know how to produce the hypervisor messages? I've already
> 
>> removed the rhgb and quiet options from the boot.
> 
>> 
>> Thanks
>> PJ
> 
> 
> I spoke too soon. To get more information: Please see
> 
> https://wiki.xenproject.org/wiki/Reporting_Bugs_against_Xen_Project
> 
> and
> 
> https://wiki.xenproject.org/wiki/Xen_Serial_Console
> 
> or alternatively at least add "vga=keep".
> 
> pjwelsh


___
CentOS-virt mailing list
CentOS-virt@centos.org
https://lists.centos.org/mailman/listinfo/centos-virt


Re: [CentOS-virt] Xen C6 kernel 4.9.13 and testing 4.9.15 only reboots.

2017-04-07 Thread PJ Welsh
On Fri, Apr 7, 2017 at 6:58 AM, PJ Welsh  wrote:

> I've not gotten any bites from my posting on the xen-devel mailing list.
> Here is the only one to-date:
> https://lists.xen.org/archives/html/xen-devel/2017-04/msg01069.html
>
> From that email, there needs to be some hypervisor messages.
>
> Does anyone know how to produce the hypervisor messages? I've already
> removed the rhgb and quiet options from the boot.
>
> Thanks
> PJ
>
>
I spoke too soon. To get more information:
Please see
https://wiki.xenproject.org/wiki/Reporting_Bugs_against_Xen_Project
and
https://wiki.xenproject.org/wiki/Xen_Serial_Console
or alternatively at least add "vga=keep".

pjwelsh
___
CentOS-virt mailing list
CentOS-virt@centos.org
https://lists.centos.org/mailman/listinfo/centos-virt


Re: [CentOS-virt] Xen C6 kernel 4.9.13 and testing 4.9.15 only reboots.

2017-04-07 Thread PJ Welsh
I've not gotten any bites from my posting on the xen-devel mailing list.
Here is the only one to-date:
https://lists.xen.org/archives/html/xen-devel/2017-04/msg01069.html

>From that email, there needs to be some hypervisor messages.

Does anyone know how to produce the hypervisor messages? I've already
removed the rhgb and quiet options from the boot.

Thanks
PJ

On Thu, Apr 6, 2017 at 11:21 PM, Sarah Newman  wrote:

> On 03/28/2017 02:55 PM, PJ Welsh wrote:
> > The mystery gets more interesting... I now have a CentOS 7.3 Dell R710
> > server doing the exact same thing of rebooting immediately after the Xen
> > kernel load. Just to note this is a second system and not just the first
> > system with an update. I hope I'm not introducing something odd. They
> only
> > "interesting" thing I have done for historical reasons is to change the
> > following /etc/sysconfig/grub line:
> > GRUB_CMDLINE_XEN_DEFAULT="dom0_mem=6G,max:8G cpuinfo com1=115200,8n1
> > console=com1,tty loglvl=all guest_loglvl=all"
> > But I've done that on other servers without issue. In fact I have a Dell
> > R710 that DOES work with CentOS 7 and the new kernel... so confused.
>
> I am having no similar issues with several Dell Proliant DL160p's and
> CentOS 6.
> They are either G5 or G6, I don't recall which.
>
> --Sarah
>
> ___
> CentOS-virt mailing list
> CentOS-virt@centos.org
> https://lists.centos.org/mailman/listinfo/centos-virt
>
___
CentOS-virt mailing list
CentOS-virt@centos.org
https://lists.centos.org/mailman/listinfo/centos-virt


Re: [CentOS-virt] Xen C6 kernel 4.9.13 and testing 4.9.15 only reboots.

2017-04-06 Thread Mark L Sung
So interesting and challenging too, IT seems to Xen compatibility to Dell
board BIOS related.

I have Dell R710 and R7200 with same Xen version, but the outcome is
completely different, that R720 is slow in performance and reboot too.

xlord

On Apr 5, 2017 23:42, "PJ Welsh"  wrote:

> On Tue, Apr 4, 2017 at 11:13 AM, Johnny Hughes  wrote:
>
>> On 03/28/2017 04:55 PM, PJ Welsh wrote:
>> > The mystery gets more interesting... I now have a CentOS 7.3 Dell R710
>> > server doing the exact same thing of rebooting immediately after the Xen
>> > kernel load. Just to note this is a second system and not just the first
>> > system with an update. I hope I'm not introducing something odd. They
>> > only "interesting" thing I have done for historical reasons is to change
>> > the following /etc/sysconfig/grub line:
>> > GRUB_CMDLINE_XEN_DEFAULT="dom0_mem=6G,max:8G cpuinfo com1=115200,8n1
>> > console=com1,tty loglvl=all guest_loglvl=all"
>> > But I've done that on other servers without issue. In fact I have a Dell
>> > R710 that DOES work with CentOS 7 and the new kernel... so confused.
>> >
>> > On Fri, Mar 24, 2017 at 1:44 PM, Sarah Newman > > > wrote:
>> >
>> > On 03/24/2017 11:35 AM, PJ Welsh wrote:
>> > > As a follow up I was able to test fresh install on Dell R710 and
>> a Dell
>> > > R620 with success on CentOS 7.3 without issue on the new kernel.
>> My new
>> > > plan will be to just move this C6 to one of the C7 I just created.
>> >
>> > That sounds like a compiler problem, since I think the C6 and C7
>> > kernels are built from the same source.
>> >
>>
>> OK, I have a new CentOS-6 4.9.20-26 kernel here for testing:
>>
>> https://people.centos.org/hughesjr/4.9.16/6/x86_64/
>>
>> I am building the el7 one right now as well, it will be at:
>>
>> https://people.centos.org/hughesjr/4.9.16/7/x86_64/
>>
>> George and I found some issues with the 4.9.x config files for the xen
>> kernel.  Hopefully this one is much more stable as it has many changes
>> from the fedora/rhel type configs now (what is built into the kernel,
>> what is loaded as a kernel module, etc.)
>>
>> Please test these kernels so we can get them released.
>>
>> Thanks,
>> Johnny Hughes
>>
>>
>>
>> ___
>> CentOS-virt mailing list
>> CentOS-virt@centos.org
>> https://lists.centos.org/mailman/listinfo/centos-virt
>>
>>
> CentOS-6 4.9.20-26 kernel exhibits the same constant
> kernel-start-then-reboot issue when booting under the "CentOS Linux, with
> Xen hypervisor" grub2 menu option. However, it *does* properly boot under
> the "CentOS Linux (4.9.20-25.el7.x86_64) 7 (Core)" grub2 menu option!
>
> A semi-close look at the /etc/grub2.cfg yields no discernible difference
> between a properly functional Dell R620 and the non-properly functioning
> Dell R710.
>
> Sorry, I had been distracted with other issues and have not yet submitted
> information to the xen-devel group yet.
>
> Thanks
> PJ
>
> ___
> CentOS-virt mailing list
> CentOS-virt@centos.org
> https://lists.centos.org/mailman/listinfo/centos-virt
>
>
___
CentOS-virt mailing list
CentOS-virt@centos.org
https://lists.centos.org/mailman/listinfo/centos-virt


Re: [CentOS-virt] Xen C6 kernel 4.9.13 and testing 4.9.15 only reboots.

2017-04-04 Thread Johnny Hughes
On 03/28/2017 04:55 PM, PJ Welsh wrote:
> The mystery gets more interesting... I now have a CentOS 7.3 Dell R710
> server doing the exact same thing of rebooting immediately after the Xen
> kernel load. Just to note this is a second system and not just the first
> system with an update. I hope I'm not introducing something odd. They
> only "interesting" thing I have done for historical reasons is to change
> the following /etc/sysconfig/grub line:
> GRUB_CMDLINE_XEN_DEFAULT="dom0_mem=6G,max:8G cpuinfo com1=115200,8n1
> console=com1,tty loglvl=all guest_loglvl=all"
> But I've done that on other servers without issue. In fact I have a Dell
> R710 that DOES work with CentOS 7 and the new kernel... so confused.
> 
> On Fri, Mar 24, 2017 at 1:44 PM, Sarah Newman  > wrote:
> 
> On 03/24/2017 11:35 AM, PJ Welsh wrote:
> > As a follow up I was able to test fresh install on Dell R710 and a Dell
> > R620 with success on CentOS 7.3 without issue on the new kernel.  My new
> > plan will be to just move this C6 to one of the C7 I just created.
> 
> That sounds like a compiler problem, since I think the C6 and C7
> kernels are built from the same source.
> 

OK, I have a new CentOS-6 4.9.20-26 kernel here for testing:

https://people.centos.org/hughesjr/4.9.16/6/x86_64/

I am building the el7 one right now as well, it will be at:

https://people.centos.org/hughesjr/4.9.16/7/x86_64/

George and I found some issues with the 4.9.x config files for the xen
kernel.  Hopefully this one is much more stable as it has many changes
from the fedora/rhel type configs now (what is built into the kernel,
what is loaded as a kernel module, etc.)

Please test these kernels so we can get them released.

Thanks,
Johnny Hughes




signature.asc
Description: OpenPGP digital signature
___
CentOS-virt mailing list
CentOS-virt@centos.org
https://lists.centos.org/mailman/listinfo/centos-virt


Re: [CentOS-virt] Xen C6 kernel 4.9.13 and testing 4.9.15 only reboots.

2017-03-29 Thread George Dunlap
On Tue, Mar 28, 2017 at 10:55 PM, PJ Welsh  wrote:
> The mystery gets more interesting... I now have a CentOS 7.3 Dell R710
> server doing the exact same thing of rebooting immediately after the Xen
> kernel load. Just to note this is a second system and not just the first
> system with an update. I hope I'm not introducing something odd. They only
> "interesting" thing I have done for historical reasons is to change the
> following /etc/sysconfig/grub line:
> GRUB_CMDLINE_XEN_DEFAULT="dom0_mem=6G,max:8G cpuinfo com1=115200,8n1
> console=com1,tty loglvl=all guest_loglvl=all"
> But I've done that on other servers without issue. In fact I have a Dell
> R710 that DOES work with CentOS 7 and the new kernel... so confused.

PJ,

Thanks for your testing and report.  Would you mind reporting this on
xen-devel?  If there's actually a bug in the Linux  4.9.x on Xen boot
path on your box, I don't think Johnny or I are going to be able to
help you debug it. :-)

 -George
___
CentOS-virt mailing list
CentOS-virt@centos.org
https://lists.centos.org/mailman/listinfo/centos-virt


Re: [CentOS-virt] Xen C6 kernel 4.9.13 and testing 4.9.15 only reboots.

2017-03-29 Thread Johnny Hughes
Maybe the BIOS versions are different on the two machines if they are
the same models.  Different disc controllers or modes set up?  Different
NICs or other add on cards?

On 03/28/2017 04:55 PM, PJ Welsh wrote:
> The mystery gets more interesting... I now have a CentOS 7.3 Dell R710
> server doing the exact same thing of rebooting immediately after the Xen
> kernel load. Just to note this is a second system and not just the first
> system with an update. I hope I'm not introducing something odd. They
> only "interesting" thing I have done for historical reasons is to change
> the following /etc/sysconfig/grub line:
> GRUB_CMDLINE_XEN_DEFAULT="dom0_mem=6G,max:8G cpuinfo com1=115200,8n1
> console=com1,tty loglvl=all guest_loglvl=all"
> But I've done that on other servers without issue. In fact I have a Dell
> R710 that DOES work with CentOS 7 and the new kernel... so confused.
> 
> On Fri, Mar 24, 2017 at 1:44 PM, Sarah Newman  > wrote:
> 
> On 03/24/2017 11:35 AM, PJ Welsh wrote:
> > As a follow up I was able to test fresh install on Dell R710 and a Dell
> > R620 with success on CentOS 7.3 without issue on the new kernel.  My new
> > plan will be to just move this C6 to one of the C7 I just created.
> 
> That sounds like a compiler problem, since I think the C6 and C7
> kernels are built from the same source.
> 
> --Sarah





signature.asc
Description: OpenPGP digital signature
___
CentOS-virt mailing list
CentOS-virt@centos.org
https://lists.centos.org/mailman/listinfo/centos-virt


Re: [CentOS-virt] Xen C6 kernel 4.9.13 and testing 4.9.15 only reboots.

2017-03-28 Thread Alvin Starr

I ran into this also.

back up to an older kernel. At least that was my solution till a kernel 
came out that would boot.


It seems that some kernel builds are not friendly to xen.


On 03/28/2017 05:55 PM, PJ Welsh wrote:
The mystery gets more interesting... I now have a CentOS 7.3 Dell R710 
server doing the exact same thing of rebooting immediately after the 
Xen kernel load. Just to note this is a second system and not just the 
first system with an update. I hope I'm not introducing something odd. 
They only "interesting" thing I have done for historical reasons is to 
change the following /etc/sysconfig/grub line:
GRUB_CMDLINE_XEN_DEFAULT="dom0_mem=6G,max:8G cpuinfo com1=115200,8n1 
console=com1,tty loglvl=all guest_loglvl=all"
But I've done that on other servers without issue. In fact I have a 
Dell R710 that DOES work with CentOS 7 and the new kernel... so confused.


On Fri, Mar 24, 2017 at 1:44 PM, Sarah Newman > wrote:


On 03/24/2017 11:35 AM, PJ Welsh wrote:
> As a follow up I was able to test fresh install on Dell R710 and
a Dell
> R620 with success on CentOS 7.3 without issue on the new
kernel.  My new
> plan will be to just move this C6 to one of the C7 I just created.

That sounds like a compiler problem, since I think the C6 and C7
kernels are built from the same source.

--Sarah
___
CentOS-virt mailing list
CentOS-virt@centos.org 
https://lists.centos.org/mailman/listinfo/centos-virt





___
CentOS-virt mailing list
CentOS-virt@centos.org
https://lists.centos.org/mailman/listinfo/centos-virt


--
Alvin Starr   ||   voice: (905)513-7688
Netvel Inc.   ||   Cell:  (416)806-0133
al...@netvel.net  ||

___
CentOS-virt mailing list
CentOS-virt@centos.org
https://lists.centos.org/mailman/listinfo/centos-virt


Re: [CentOS-virt] Xen C6 kernel 4.9.13 and testing 4.9.15 only reboots.

2017-03-28 Thread PJ Welsh
The mystery gets more interesting... I now have a CentOS 7.3 Dell R710
server doing the exact same thing of rebooting immediately after the Xen
kernel load. Just to note this is a second system and not just the first
system with an update. I hope I'm not introducing something odd. They only
"interesting" thing I have done for historical reasons is to change the
following /etc/sysconfig/grub line:
GRUB_CMDLINE_XEN_DEFAULT="dom0_mem=6G,max:8G cpuinfo com1=115200,8n1
console=com1,tty loglvl=all guest_loglvl=all"
But I've done that on other servers without issue. In fact I have a Dell
R710 that DOES work with CentOS 7 and the new kernel... so confused.

On Fri, Mar 24, 2017 at 1:44 PM, Sarah Newman  wrote:

> On 03/24/2017 11:35 AM, PJ Welsh wrote:
> > As a follow up I was able to test fresh install on Dell R710 and a Dell
> > R620 with success on CentOS 7.3 without issue on the new kernel.  My new
> > plan will be to just move this C6 to one of the C7 I just created.
>
> That sounds like a compiler problem, since I think the C6 and C7 kernels
> are built from the same source.
>
> --Sarah
> ___
> CentOS-virt mailing list
> CentOS-virt@centos.org
> https://lists.centos.org/mailman/listinfo/centos-virt
>
___
CentOS-virt mailing list
CentOS-virt@centos.org
https://lists.centos.org/mailman/listinfo/centos-virt


Re: [CentOS-virt] Xen C6 kernel 4.9.13 and testing 4.9.15 only reboots.

2017-03-24 Thread Sarah Newman
On 03/24/2017 11:35 AM, PJ Welsh wrote:
> As a follow up I was able to test fresh install on Dell R710 and a Dell
> R620 with success on CentOS 7.3 without issue on the new kernel.  My new
> plan will be to just move this C6 to one of the C7 I just created.

That sounds like a compiler problem, since I think the C6 and C7 kernels are 
built from the same source.

--Sarah
___
CentOS-virt mailing list
CentOS-virt@centos.org
https://lists.centos.org/mailman/listinfo/centos-virt


Re: [CentOS-virt] Xen C6 kernel 4.9.13 and testing 4.9.15 only reboots.

2017-03-24 Thread PJ Welsh
As a follow up I was able to test fresh install on Dell R710 and a Dell
R620 with success on CentOS 7.3 without issue on the new kernel.  My new
plan will be to just move this C6 to one of the C7 I just created.

On Wed, Mar 22, 2017, 6:27 AM PJ Welsh  wrote:

> The last few lines are
> NMI watchdog: disabled CPU0 hardware events not enabled
> NMI watchdog: shutting down hard lockup detector on all CPUS
> installing Xen timer for CPU1
> installing Xen timer for CPU2
> installing Xen timer for CPU3
> installing Xen timer for CPU4
> installing Xen timer for CPU5
> installing Xen timer for CPU6
>
> Here is the screen shot:
> https://goo.gl/photos/yNQqaQY9bJBWQ84X8
> It stops at CPU6. This is a dual socket server with 2x 6core L5639 CPUs
> (HT disabled). I'm surprised to see it stop at 6.
>
> Thanks
> PJ
>
>
>
>
> On Tue, Mar 21, 2017 at 1:39 PM, Kevin Stange  wrote:
>
> On 03/21/2017 07:48 AM, PJ Welsh wrote:
> > On Mon, Mar 20, 2017 at 5:21 PM, Ricardo J. Barberis
> > > wrote:
> >
> > El Lunes 20/03/2017, PJ Welsh escribió:
> > > Still just starts the kernel and wihtin 4 seconds reboots with
> 4.9.16-24.
> > > Thanks
> > > PJ
> >
> > Edit grub's entry and add "noreboot" to your xen parameters, maybe
> > when the
> > kernel panicks xen detects it and automatically reboots it.
> >
> >
> >
> > "noreboot" grub.conf option still produced nothing other than a flashing
> > cursor on the top left. Also, neither num-lock nor caps-lock respond at
> > this time... I seem no closer with helpful information other than, "it's
> > broken" :(
> > Here is the grub.conf stanza for the kernel:
> > title CentOS (4.9.16-24.el6.centos.plus.x86_64)
> > root (hd0,1)
> > kernel /boot/xen.gz dom0_mem=3G,max:3G cpuinfo com1=115200,8n1
> > console=com1,tty loglvl=all gue
> > st_loglvl=all noreboot
> > module /boot/vmlinuz-4.9.16-24.el6.centos.plus.x86_64 ro
> > root=UUID=bc0727e1-882c-4fbc-a4d9-e4c
> > f754d72b7 rd_NO_LUKS rd_NO_LVM LANG=en_US.UTF-8 rd_NO_MD
> > SYSFONT=latarcyrheb-sun16 crashkernel=auto  K
> > EYBOARDTYPE=pc KEYTABLE=us rd_NO_DM rhgb quiet reboot=pci max_loop=64
> > module /boot/initramfs-4.9.16-24.el6.centos.plus.x86_64.img
>
> Try removing "rhgb" and "quiet" from your boot options as well.
>
> --
> Kevin Stange
> Chief Technology Officer
> Steadfast | Managed Infrastructure, Datacenter and Cloud Services
> 800 S Wells, Suite 190 | Chicago, IL 60607
> 312.602.2689 X203 | Fax: 312.602.2688
> ke...@steadfast.net | www.steadfast.net
> ___
> CentOS-virt mailing list
> CentOS-virt@centos.org
> https://lists.centos.org/mailman/listinfo/centos-virt
>
>
>
___
CentOS-virt mailing list
CentOS-virt@centos.org
https://lists.centos.org/mailman/listinfo/centos-virt


Re: [CentOS-virt] Xen C6 kernel 4.9.13 and testing 4.9.15 only reboots.

2017-03-22 Thread PJ Welsh
The last few lines are
NMI watchdog: disabled CPU0 hardware events not enabled
NMI watchdog: shutting down hard lockup detector on all CPUS
installing Xen timer for CPU1
installing Xen timer for CPU2
installing Xen timer for CPU3
installing Xen timer for CPU4
installing Xen timer for CPU5
installing Xen timer for CPU6

Here is the screen shot:
https://goo.gl/photos/yNQqaQY9bJBWQ84X8
It stops at CPU6. This is a dual socket server with 2x 6core L5639 CPUs (HT
disabled). I'm surprised to see it stop at 6.

Thanks
PJ




On Tue, Mar 21, 2017 at 1:39 PM, Kevin Stange  wrote:

> On 03/21/2017 07:48 AM, PJ Welsh wrote:
> > On Mon, Mar 20, 2017 at 5:21 PM, Ricardo J. Barberis
> > > wrote:
> >
> > El Lunes 20/03/2017, PJ Welsh escribió:
> > > Still just starts the kernel and wihtin 4 seconds reboots with
> 4.9.16-24.
> > > Thanks
> > > PJ
> >
> > Edit grub's entry and add "noreboot" to your xen parameters, maybe
> > when the
> > kernel panicks xen detects it and automatically reboots it.
> >
> >
> >
> > "noreboot" grub.conf option still produced nothing other than a flashing
> > cursor on the top left. Also, neither num-lock nor caps-lock respond at
> > this time... I seem no closer with helpful information other than, "it's
> > broken" :(
> > Here is the grub.conf stanza for the kernel:
> > title CentOS (4.9.16-24.el6.centos.plus.x86_64)
> > root (hd0,1)
> > kernel /boot/xen.gz dom0_mem=3G,max:3G cpuinfo com1=115200,8n1
> > console=com1,tty loglvl=all gue
> > st_loglvl=all noreboot
> > module /boot/vmlinuz-4.9.16-24.el6.centos.plus.x86_64 ro
> > root=UUID=bc0727e1-882c-4fbc-a4d9-e4c
> > f754d72b7 rd_NO_LUKS rd_NO_LVM LANG=en_US.UTF-8 rd_NO_MD
> > SYSFONT=latarcyrheb-sun16 crashkernel=auto  K
> > EYBOARDTYPE=pc KEYTABLE=us rd_NO_DM rhgb quiet reboot=pci max_loop=64
> > module /boot/initramfs-4.9.16-24.el6.centos.plus.x86_64.img
>
> Try removing "rhgb" and "quiet" from your boot options as well.
>
> --
> Kevin Stange
> Chief Technology Officer
> Steadfast | Managed Infrastructure, Datacenter and Cloud Services
> 800 S Wells, Suite 190 | Chicago, IL 60607
> 312.602.2689 X203 | Fax: 312.602.2688
> ke...@steadfast.net | www.steadfast.net
> ___
> CentOS-virt mailing list
> CentOS-virt@centos.org
> https://lists.centos.org/mailman/listinfo/centos-virt
>
___
CentOS-virt mailing list
CentOS-virt@centos.org
https://lists.centos.org/mailman/listinfo/centos-virt


Re: [CentOS-virt] Xen C6 kernel 4.9.13 and testing 4.9.15 only reboots.

2017-03-20 Thread PJ Welsh
Sure thing. I will need to wait until AM Tuesday USA time to test now.
Thanks
PJ

On Mon, Mar 20, 2017 at 5:21 PM, Ricardo J. Barberis 
wrote:

> El Lunes 20/03/2017, PJ Welsh escribió:
> > Still just starts the kernel and wihtin 4 seconds reboots with 4.9.16-24.
> > Thanks
> > PJ
>
> Edit grub's entry and add "noreboot" to your xen parameters, maybe when the
> kernel panicks xen detects it and automatically reboots it.
>
>
> > On Mon, Mar 20, 2017 at 2:23 PM, Johnny Hughes 
> wrote:
> > > On 03/20/2017 01:20 PM, PJ Welsh wrote:
> > > > No warning, but still just reboots with no notice.
> > > > Is there any other system info you need?
> > > > Thanks
> > > > PJ
> > >
> > > Try the new 4.9.16-24 packages there now.  (reworked the config based
> on
> > > a fedora kernel)
> > >
> > > > On Mon, Mar 20, 2017 at 11:47 AM, Johnny Hughes  > > > > wrote:
> > > >
> > > > On 03/20/2017 11:21 AM, Johnny Hughes wrote:
> > > > > On 03/20/2017 08:35 AM, PJ Welsh wrote:
> > > > >> Updating my CentOS 6.8 Xen server with new 4.9.13 kernel
> yields
> > > > >> a
> > > >
> > > > kernel
> > > >
> > > > >> boot message of a few like "APIC ID MISMATCH" and the system
> > >
> > > reboots
> > >
> > > > >> immediately without any other bits of info. This is on a Dell
> > > >
> > > > R710 with
> > > >
> > > > >> 64GB RAM and 2x 6-core Intel CPU's.
> > > > >> As an additional test, I installed and attempted to run the
> > >
> > > current
> > >
> > > > >> "testing" kernel of 4.9.16 with the exact same results.
> > > > >>
> > > > >> Anyone have an idea? The 3.18.x series runs without issue of
> > >
> > > course.
> > >
> > > > > I think the APIC ID MISMATCH is an expected and ignorable error
> > > > > ..
> > > >
> > > > see:
> > > > > https://patchwork.kernel.org/patch/9539933/
> > > >
> > > > 
> > > >
> > > > > I applied that patch and I am building a 4.9.16-23 right now, I
> > > > > 'll publish it when it finishes.  Maybe with the error gone we
> > > > > can get
> > >
> > > a
> > >
> > > > > better error in the console.
> > > >
> > > > OK, try the 4.9.16-23 packages here:
> > > >
> > > > https://people.centos.org/hughesjr/4.9.16/x86_64/
> > > > 
> > >
> > > ___
> > > CentOS-virt mailing list
> > > CentOS-virt@centos.org
> > > https://lists.centos.org/mailman/listinfo/centos-virt
> --
> Ricardo J. Barberis
> Usuario Linux Nº 250625: http://counter.li.org/
> Usuario LFS Nº 5121: http://www.linuxfromscratch.org/
> Senior SysAdmin / IT Architect - www.DonWeb.com
> ___
> CentOS-virt mailing list
> CentOS-virt@centos.org
> https://lists.centos.org/mailman/listinfo/centos-virt
>
___
CentOS-virt mailing list
CentOS-virt@centos.org
https://lists.centos.org/mailman/listinfo/centos-virt


Re: [CentOS-virt] Xen C6 kernel 4.9.13 and testing 4.9.15 only reboots.

2017-03-20 Thread Ricardo J. Barberis
El Lunes 20/03/2017, PJ Welsh escribió:
> Still just starts the kernel and wihtin 4 seconds reboots with 4.9.16-24.
> Thanks
> PJ

Edit grub's entry and add "noreboot" to your xen parameters, maybe when the 
kernel panicks xen detects it and automatically reboots it.


> On Mon, Mar 20, 2017 at 2:23 PM, Johnny Hughes  wrote:
> > On 03/20/2017 01:20 PM, PJ Welsh wrote:
> > > No warning, but still just reboots with no notice.
> > > Is there any other system info you need?
> > > Thanks
> > > PJ
> >
> > Try the new 4.9.16-24 packages there now.  (reworked the config based on
> > a fedora kernel)
> >
> > > On Mon, Mar 20, 2017 at 11:47 AM, Johnny Hughes  > > > wrote:
> > >
> > > On 03/20/2017 11:21 AM, Johnny Hughes wrote:
> > > > On 03/20/2017 08:35 AM, PJ Welsh wrote:
> > > >> Updating my CentOS 6.8 Xen server with new 4.9.13 kernel yields
> > > >> a
> > >
> > > kernel
> > >
> > > >> boot message of a few like "APIC ID MISMATCH" and the system
> >
> > reboots
> >
> > > >> immediately without any other bits of info. This is on a Dell
> > >
> > > R710 with
> > >
> > > >> 64GB RAM and 2x 6-core Intel CPU's.
> > > >> As an additional test, I installed and attempted to run the
> >
> > current
> >
> > > >> "testing" kernel of 4.9.16 with the exact same results.
> > > >>
> > > >> Anyone have an idea? The 3.18.x series runs without issue of
> >
> > course.
> >
> > > > I think the APIC ID MISMATCH is an expected and ignorable error
> > > > ..
> > >
> > > see:
> > > > https://patchwork.kernel.org/patch/9539933/
> > >
> > > 
> > >
> > > > I applied that patch and I am building a 4.9.16-23 right now, I
> > > > 'll publish it when it finishes.  Maybe with the error gone we
> > > > can get
> >
> > a
> >
> > > > better error in the console.
> > >
> > > OK, try the 4.9.16-23 packages here:
> > >
> > > https://people.centos.org/hughesjr/4.9.16/x86_64/
> > > 
> >
> > ___
> > CentOS-virt mailing list
> > CentOS-virt@centos.org
> > https://lists.centos.org/mailman/listinfo/centos-virt
-- 
Ricardo J. Barberis
Usuario Linux Nº 250625: http://counter.li.org/
Usuario LFS Nº 5121: http://www.linuxfromscratch.org/
Senior SysAdmin / IT Architect - www.DonWeb.com
___
CentOS-virt mailing list
CentOS-virt@centos.org
https://lists.centos.org/mailman/listinfo/centos-virt


Re: [CentOS-virt] Xen C6 kernel 4.9.13 and testing 4.9.15 only reboots.

2017-03-20 Thread PJ Welsh
Still just starts the kernel and wihtin 4 seconds reboots with 4.9.16-24.
Thanks
PJ

On Mon, Mar 20, 2017 at 2:23 PM, Johnny Hughes  wrote:

> On 03/20/2017 01:20 PM, PJ Welsh wrote:
> > No warning, but still just reboots with no notice.
> > Is there any other system info you need?
> > Thanks
> > PJ
> >
>
>
>
> Try the new 4.9.16-24 packages there now.  (reworked the config based on
> a fedora kernel)
>
>
>
>
> > On Mon, Mar 20, 2017 at 11:47 AM, Johnny Hughes  > > wrote:
> >
> > On 03/20/2017 11:21 AM, Johnny Hughes wrote:
> > > On 03/20/2017 08:35 AM, PJ Welsh wrote:
> > >> Updating my CentOS 6.8 Xen server with new 4.9.13 kernel yields a
> > kernel
> > >> boot message of a few like "APIC ID MISMATCH" and the system
> reboots
> > >> immediately without any other bits of info. This is on a Dell
> > R710 with
> > >> 64GB RAM and 2x 6-core Intel CPU's.
> > >> As an additional test, I installed and attempted to run the
> current
> > >> "testing" kernel of 4.9.16 with the exact same results.
> > >>
> > >> Anyone have an idea? The 3.18.x series runs without issue of
> course.
> > >>
> > >
> > > I think the APIC ID MISMATCH is an expected and ignorable error ..
> > see:
> > >
> > > https://patchwork.kernel.org/patch/9539933/
> > 
> > >
> > > I applied that patch and I am building a 4.9.16-23 right now, I 'll
> > > publish it when it finishes.  Maybe with the error gone we can get
> a
> > > better error in the console.
> > >
> > >
> >
> > OK, try the 4.9.16-23 packages here:
> >
> > https://people.centos.org/hughesjr/4.9.16/x86_64/
> > 
> >
>
>
> ___
> CentOS-virt mailing list
> CentOS-virt@centos.org
> https://lists.centos.org/mailman/listinfo/centos-virt
>
>
___
CentOS-virt mailing list
CentOS-virt@centos.org
https://lists.centos.org/mailman/listinfo/centos-virt


Re: [CentOS-virt] Xen C6 kernel 4.9.13 and testing 4.9.15 only reboots.

2017-03-20 Thread Johnny Hughes
On 03/20/2017 01:20 PM, PJ Welsh wrote:
> No warning, but still just reboots with no notice.
> Is there any other system info you need?
> Thanks
> PJ
> 



Try the new 4.9.16-24 packages there now.  (reworked the config based on
a fedora kernel)




> On Mon, Mar 20, 2017 at 11:47 AM, Johnny Hughes  > wrote:
> 
> On 03/20/2017 11:21 AM, Johnny Hughes wrote:
> > On 03/20/2017 08:35 AM, PJ Welsh wrote:
> >> Updating my CentOS 6.8 Xen server with new 4.9.13 kernel yields a
> kernel
> >> boot message of a few like "APIC ID MISMATCH" and the system reboots
> >> immediately without any other bits of info. This is on a Dell
> R710 with
> >> 64GB RAM and 2x 6-core Intel CPU's.
> >> As an additional test, I installed and attempted to run the current
> >> "testing" kernel of 4.9.16 with the exact same results.
> >>
> >> Anyone have an idea? The 3.18.x series runs without issue of course.
> >>
> >
> > I think the APIC ID MISMATCH is an expected and ignorable error ..
> see:
> >
> > https://patchwork.kernel.org/patch/9539933/
> 
> >
> > I applied that patch and I am building a 4.9.16-23 right now, I 'll
> > publish it when it finishes.  Maybe with the error gone we can get a
> > better error in the console.
> >
> >
> 
> OK, try the 4.9.16-23 packages here:
> 
> https://people.centos.org/hughesjr/4.9.16/x86_64/
> 
>



signature.asc
Description: OpenPGP digital signature
___
CentOS-virt mailing list
CentOS-virt@centos.org
https://lists.centos.org/mailman/listinfo/centos-virt


Re: [CentOS-virt] Xen C6 kernel 4.9.13 and testing 4.9.15 only reboots.

2017-03-20 Thread PJ Welsh
No warning, but still just reboots with no notice.
Is there any other system info you need?
Thanks
PJ

On Mon, Mar 20, 2017 at 11:47 AM, Johnny Hughes  wrote:

> On 03/20/2017 11:21 AM, Johnny Hughes wrote:
> > On 03/20/2017 08:35 AM, PJ Welsh wrote:
> >> Updating my CentOS 6.8 Xen server with new 4.9.13 kernel yields a kernel
> >> boot message of a few like "APIC ID MISMATCH" and the system reboots
> >> immediately without any other bits of info. This is on a Dell R710 with
> >> 64GB RAM and 2x 6-core Intel CPU's.
> >> As an additional test, I installed and attempted to run the current
> >> "testing" kernel of 4.9.16 with the exact same results.
> >>
> >> Anyone have an idea? The 3.18.x series runs without issue of course.
> >>
> >
> > I think the APIC ID MISMATCH is an expected and ignorable error .. see:
> >
> > https://patchwork.kernel.org/patch/9539933/
> >
> > I applied that patch and I am building a 4.9.16-23 right now, I 'll
> > publish it when it finishes.  Maybe with the error gone we can get a
> > better error in the console.
> >
> >
>
> OK, try the 4.9.16-23 packages here:
>
> https://people.centos.org/hughesjr/4.9.16/x86_64/
>
>
>
> ___
> CentOS-virt mailing list
> CentOS-virt@centos.org
> https://lists.centos.org/mailman/listinfo/centos-virt
>
>
___
CentOS-virt mailing list
CentOS-virt@centos.org
https://lists.centos.org/mailman/listinfo/centos-virt


Re: [CentOS-virt] Xen C6 kernel 4.9.13 and testing 4.9.15 only reboots.

2017-03-20 Thread Johnny Hughes
On 03/20/2017 11:21 AM, Johnny Hughes wrote:
> On 03/20/2017 08:35 AM, PJ Welsh wrote:
>> Updating my CentOS 6.8 Xen server with new 4.9.13 kernel yields a kernel
>> boot message of a few like "APIC ID MISMATCH" and the system reboots
>> immediately without any other bits of info. This is on a Dell R710 with
>> 64GB RAM and 2x 6-core Intel CPU's.
>> As an additional test, I installed and attempted to run the current
>> "testing" kernel of 4.9.16 with the exact same results.
>>
>> Anyone have an idea? The 3.18.x series runs without issue of course.
>>
> 
> I think the APIC ID MISMATCH is an expected and ignorable error .. see:
> 
> https://patchwork.kernel.org/patch/9539933/
> 
> I applied that patch and I am building a 4.9.16-23 right now, I 'll
> publish it when it finishes.  Maybe with the error gone we can get a
> better error in the console.
> 
> 

OK, try the 4.9.16-23 packages here:

https://people.centos.org/hughesjr/4.9.16/x86_64/




signature.asc
Description: OpenPGP digital signature
___
CentOS-virt mailing list
CentOS-virt@centos.org
https://lists.centos.org/mailman/listinfo/centos-virt


Re: [CentOS-virt] Xen C6 kernel 4.9.13 and testing 4.9.15 only reboots.

2017-03-20 Thread Johnny Hughes
On 03/20/2017 08:35 AM, PJ Welsh wrote:
> Updating my CentOS 6.8 Xen server with new 4.9.13 kernel yields a kernel
> boot message of a few like "APIC ID MISMATCH" and the system reboots
> immediately without any other bits of info. This is on a Dell R710 with
> 64GB RAM and 2x 6-core Intel CPU's.
> As an additional test, I installed and attempted to run the current
> "testing" kernel of 4.9.16 with the exact same results.
> 
> Anyone have an idea? The 3.18.x series runs without issue of course.
> 

I think the APIC ID MISMATCH is an expected and ignorable error .. see:

https://patchwork.kernel.org/patch/9539933/

I applied that patch and I am building a 4.9.16-23 right now, I 'll
publish it when it finishes.  Maybe with the error gone we can get a
better error in the console.




signature.asc
Description: OpenPGP digital signature
___
CentOS-virt mailing list
CentOS-virt@centos.org
https://lists.centos.org/mailman/listinfo/centos-virt


Re: [CentOS-virt] Xen C6 kernel 4.9.13 and testing 4.9.15 only reboots.

2017-03-20 Thread Johnny Hughes
On 03/20/2017 08:35 AM, PJ Welsh wrote:
> Updating my CentOS 6.8 Xen server with new 4.9.13 kernel yields a kernel
> boot message of a few like "APIC ID MISMATCH" and the system reboots
> immediately without any other bits of info. This is on a Dell R710 with
> 64GB RAM and 2x 6-core Intel CPU's.
> As an additional test, I installed and attempted to run the current
> "testing" kernel of 4.9.16 with the exact same results.
> 
> Anyone have an idea? The 3.18.x series runs without issue of course.

Try this kernel (the noarch kernel-doc is not done yet), but that is not
a required package:

https://people.centos.org/hughesjr/4.9.16/x86_64/

Let me know if that works or not .. we can try adjusting some other
config settings.

Don't worry about the centos.plus dist tag .. that will change when we
subnit it via the regular process.

Thanks,
Johnny Hughes




signature.asc
Description: OpenPGP digital signature
___
CentOS-virt mailing list
CentOS-virt@centos.org
https://lists.centos.org/mailman/listinfo/centos-virt