Re: [CentOS] BUG: soft lockup - CPU#0 stuck for 36s! [swapper/0:0]

2018-04-25 Thread mark

On 04/24/18 17:33, Stephen John Smoogen wrote:

On 24 April 2018 at 17:16,   wrote:

Adam Tauno Williams wrote:

On Mon, 2017-08-07 at 15:26 +, KM wrote:

All,This happens on all of our CentOS 7 VMs.  but as stated in the
email trail, the file softlockup_thresh does not exist.  Should it be
added?  What is the best way to get rid of this behavior.
Thanks in advance and sorry if I missed something along the way.KM


Yes, I see this behavior as well.  Never have found a solution - other
than increasing the threshold and pretending it doesn't happen.


We see it a fair bit, and this is on server running on bare metal, not VMs.


On bare metal is usually means some hardware has gone into an
uninteruptable IRQ and the CPU is waiting for it to go away. I saw
this with systems with Green disk drives a while ago. Something going
to talk to the drive would just sit for long times while the drive
spun up, the cache was validated etc. Other things would be drives on
USB disks too when some other USB item started needing input.. since
it is a hub environment they can spew for a while and the CPU would
report a soft-lockup.

Not hardly. We discovered green drives were nothing we wanted right after they 
came out. And I'm talking at work, with servers, all drives are either 
enterprise, as we bought them, or NAS-rated (e.g. WD Red).


mark
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] BUG: soft lockup - CPU#0 stuck for 36s! [swapper/0:0]

2018-04-24 Thread Stephen John Smoogen
On 24 April 2018 at 17:16,   wrote:
> Adam Tauno Williams wrote:
>> On Mon, 2017-08-07 at 15:26 +, KM wrote:
>>> All,This happens on all of our CentOS 7 VMs.  but as stated in the
>>> email trail, the file softlockup_thresh does not exist.  Should it be
>>> added?  What is the best way to get rid of this behavior.
>>> Thanks in advance and sorry if I missed something along the way.KM
>>
>> Yes, I see this behavior as well.  Never have found a solution - other
>> than increasing the threshold and pretending it doesn't happen.
>>
> We see it a fair bit, and this is on server running on bare metal, not VMs.
>

On bare metal is usually means some hardware has gone into an
uninteruptable IRQ and the CPU is waiting for it to go away. I saw
this with systems with Green disk drives a while ago. Something going
to talk to the drive would just sit for long times while the drive
spun up, the cache was validated etc. Other things would be drives on
USB disks too when some other USB item started needing input.. since
it is a hub environment they can spew for a while and the CPU would
report a soft-lockup.

>   mark
>
> ___
> CentOS mailing list
> CentOS@centos.org
> https://lists.centos.org/mailman/listinfo/centos



-- 
Stephen J Smoogen.
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] BUG: soft lockup - CPU#0 stuck for 36s! [swapper/0:0]

2018-04-24 Thread m . roth
Adam Tauno Williams wrote:
> On Mon, 2017-08-07 at 15:26 +, KM wrote:
>> All,This happens on all of our CentOS 7 VMs.  but as stated in the
>> email trail, the file softlockup_thresh does not exist.  Should it be
>> added?  What is the best way to get rid of this behavior.
>> Thanks in advance and sorry if I missed something along the way.KM
>
> Yes, I see this behavior as well.  Never have found a solution - other
> than increasing the threshold and pretending it doesn't happen.
>
We see it a fair bit, and this is on server running on bare metal, not VMs.

  mark

___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] BUG: soft lockup - CPU#0 stuck for 36s! [swapper/0:0]

2018-04-24 Thread Adam Tauno Williams
On Mon, 2017-08-07 at 15:26 +, KM wrote:
> All,This happens on all of our CentOS 7 VMs.  but as stated in the
> email trail, the file softlockup_thresh does not exist.  Should it be
> added?  What is the best way to get rid of this behavior.
> Thanks in advance and sorry if I missed something along the way.KM

Yes, I see this behavior as well.  Never have found a solution - other
than increasing the threshold and pretending it doesn't happen.

___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] BUG: soft lockup - CPU#0 stuck for 36s! [swapper/0:0]

2017-08-10 Thread KM

 
Never saw this emailDid anyone get it?  anyone know how to fix this?thanks 
again.

  From: KM <info...@yahoo.com>
 To: CentOS mailing list <centos@centos.org> 
 Sent: Monday, August 7, 2017 11:26 AM
 Subject: Re: [CentOS] BUG: soft lockup - CPU#0 stuck for 36s! [swapper/0:0]
  
All,This happens on all of our CentOS 7 VMs.  but as stated in the email trail, 
the file softlockup_thresh does not exist.  Should it be added?  What is the 
best way to get rid of this behavior.
Thanks in advance and sorry if I missed something along the way.KM

  From: correomm <corre...@gmail.com>
 To: CentOS mailing list <centos@centos.org> 
 Sent: Thursday, August 18, 2016 1:55 PM
 Subject: Re: [CentOS] BUG: soft lockup - CPU#0 stuck for 36s! [swapper/0:0]
  
Yes, I tried it, but does not exists:

vmguest # cat /proc/sys/kernel/softlockup_thresh
cat: /proc/sys/kernel/softlockup_thresh: No such file or directory

On Thu, Aug 18, 2016 at 2:06 PM, Carlos A. Carnero Delgado <
carloscarn...@gmail.com> wrote:

> 2016-08-18 12:39 GMT-04:00 correomm <corre...@gmail.com>:
>
> > This bug is reported only on the VM's with CentOS 7 running on on VMware
> > ESXi 5.1.
> > The vSphere performance graph shows high CPU consume and disk activity
> only
> > on VM's with CentOS 7. Sometimes I can not connect remotely with ssh
> > (timeout error).
> >
>
> I'm also seeing those errors in several servers, running under 5.5.
> Currently investigating if this
> <https://kb.vmware.com/selfservice/microsites/search.
> do?language=en_US=displayKC=1009996>
> has anything to do (the resource overcommit bit).
>
> HTH,
> Carlos.
> ___
> CentOS mailing list
> CentOS@centos.org
> https://lists.centos.org/mailman/listinfo/centos
>
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


   

|  | Virus-free. www.avg.com  |



   

   
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] BUG: soft lockup - CPU#0 stuck for 36s! [swapper/0:0]

2017-08-08 Thread KM
Never saw this emailDid anyone get it?  anyone know how to fix this?thanks 
again.

  From: KM <info...@yahoo.com>
 To: CentOS mailing list <centos@centos.org> 
 Sent: Monday, August 7, 2017 11:26 AM
 Subject: Re: [CentOS] BUG: soft lockup - CPU#0 stuck for 36s! [swapper/0:0]
   
All,This happens on all of our CentOS 7 VMs.  but as stated in the email trail, 
the file softlockup_thresh does not exist.  Should it be added?  What is the 
best way to get rid of this behavior.
Thanks in advance and sorry if I missed something along the way.KM

  From: correomm <corre...@gmail.com>
 To: CentOS mailing list <centos@centos.org> 
 Sent: Thursday, August 18, 2016 1:55 PM
 Subject: Re: [CentOS] BUG: soft lockup - CPU#0 stuck for 36s! [swapper/0:0]
   
Yes, I tried it, but does not exists:

vmguest # cat /proc/sys/kernel/softlockup_thresh
cat: /proc/sys/kernel/softlockup_thresh: No such file or directory

On Thu, Aug 18, 2016 at 2:06 PM, Carlos A. Carnero Delgado <
carloscarn...@gmail.com> wrote:

> 2016-08-18 12:39 GMT-04:00 correomm <corre...@gmail.com>:
>
> > This bug is reported only on the VM's with CentOS 7 running on on VMware
> > ESXi 5.1.
> > The vSphere performance graph shows high CPU consume and disk activity
> only
> > on VM's with CentOS 7. Sometimes I can not connect remotely with ssh
> > (timeout error).
> >
>
> I'm also seeing those errors in several servers, running under 5.5.
> Currently investigating if this
> <https://kb.vmware.com/selfservice/microsites/search.
> do?language=en_US=displayKC=1009996>
> has anything to do (the resource overcommit bit).
>
> HTH,
> Carlos.
> ___
> CentOS mailing list
> CentOS@centos.org
> https://lists.centos.org/mailman/listinfo/centos
>
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


   

|  | Virus-free. www.avg.com  |



   
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] BUG: soft lockup - CPU#0 stuck for 36s! [swapper/0:0]

2017-08-07 Thread KM
All,This happens on all of our CentOS 7 VMs.  but as stated in the email trail, 
the file softlockup_thresh does not exist.  Should it be added?  What is the 
best way to get rid of this behavior.
Thanks in advance and sorry if I missed something along the way.KM

  From: correomm <corre...@gmail.com>
 To: CentOS mailing list <centos@centos.org> 
 Sent: Thursday, August 18, 2016 1:55 PM
 Subject: Re: [CentOS] BUG: soft lockup - CPU#0 stuck for 36s! [swapper/0:0]
   
Yes, I tried it, but does not exists:

vmguest # cat /proc/sys/kernel/softlockup_thresh
cat: /proc/sys/kernel/softlockup_thresh: No such file or directory

On Thu, Aug 18, 2016 at 2:06 PM, Carlos A. Carnero Delgado <
carloscarn...@gmail.com> wrote:

> 2016-08-18 12:39 GMT-04:00 correomm <corre...@gmail.com>:
>
> > This bug is reported only on the VM's with CentOS 7 running on on VMware
> > ESXi 5.1.
> > The vSphere performance graph shows high CPU consume and disk activity
> only
> > on VM's with CentOS 7. Sometimes I can not connect remotely with ssh
> > (timeout error).
> >
>
> I'm also seeing those errors in several servers, running under 5.5.
> Currently investigating if this
> <https://kb.vmware.com/selfservice/microsites/search.
> do?language=en_US=displayKC=1009996>
> has anything to do (the resource overcommit bit).
>
> HTH,
> Carlos.
> ___
> CentOS mailing list
> CentOS@centos.org
> https://lists.centos.org/mailman/listinfo/centos
>
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


   

|  | Virus-free. www.avg.com  |

___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] BUG: soft lockup - CPU#0 stuck for 36s! [swapper/0:0]

2016-12-08 Thread KM
Not sure if this was the last email on this.  If not ignore me. However I found 
a post for new operating systems that says to set the watchdog_thresh value 
instead of softlockup_thresh.  
http://askubuntu.com/questions/592412/why-is-there-no-proc-sys-kernel-softlockup-thresh
this is an Ubuntu post, but on my CentOS 7 system this parameter exists, and 
softlockup_thresh does not.  I have set it but I will need to see if I still 
get the CPU lock up messages on my VM.
I hope this helps.KM



  From: correomm <corre...@gmail.com>
 To: CentOS mailing list <centos@centos.org> 
 Sent: Thursday, August 18, 2016 1:50 PM
 Subject: Re: [CentOS] BUG: soft lockup - CPU#0 stuck for 36s! [swapper/0:0]
   
Yes, I tried it, but does not exists:

vmguest # cat /proc/sys/kernel/softlockup_thresh
cat: /proc/sys/kernel/softlockup_thresh: No such file or directory

On Thu, Aug 18, 2016 at 2:06 PM, Carlos A. Carnero Delgado <
carloscarn...@gmail.com> wrote:

> 2016-08-18 12:39 GMT-04:00 correomm <corre...@gmail.com>:
>
> > This bug is reported only on the VM's with CentOS 7 running on on VMware
> > ESXi 5.1.
> > The vSphere performance graph shows high CPU consume and disk activity
> only
> > on VM's with CentOS 7. Sometimes I can not connect remotely with ssh
> > (timeout error).
> >
>
> I'm also seeing those errors in several servers, running under 5.5.
> Currently investigating if this
> <https://kb.vmware.com/selfservice/microsites/search.
> do?language=en_US=displayKC=1009996>
> has anything to do (the resource overcommit bit).
>
> HTH,
> Carlos.
> ___
> CentOS mailing list
> CentOS@centos.org
> https://lists.centos.org/mailman/listinfo/centos
>
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


   
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] BUG: soft lockup - CPU#0 stuck for 36s! [swapper/0:0]

2016-08-18 Thread correomm
No, I don't use snapshots.

It is a Dell 2 TB Enterprise 3.5" SATA Hard Drive.

The disk activity of the host is normal to low. Few VM's.

On Thu, Aug 18, 2016 at 2:32 PM, JJB  wrote:

>
> 2016-08-18 12:39 GMT-04:00 correomm :
>>
>> This bug is reported only on the VM's with CentOS 7 running on on VMware
>>> ESXi 5.1.
>>> The vSphere performance graph shows high CPU consume and disk activity
>>> only
>>> on VM's with CentOS 7. Sometimes I can not connect remotely with ssh
>>> (timeout error).
>>>
>>> I'm also seeing those errors in several servers, running under 5.5.
>> Currently investigating if this
>> > language=en_US=displayKC=1009996>
>> has anything to do (the resource overcommit bit).
>>
>
> Does this happen (only) while taking or consolidating snapshots? The VM is
> suspended during these operations and the OS isn't too crazy about it,
> especially if you have slow storage.
>
> Jack
>
>
> ___
> CentOS mailing list
> CentOS@centos.org
> https://lists.centos.org/mailman/listinfo/centos
>
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] BUG: soft lockup - CPU#0 stuck for 36s! [swapper/0:0]

2016-08-18 Thread Carlos A. Carnero Delgado
2016-08-18 13:32 GMT-04:00 JJB :

>
> I'm also seeing those errors in several servers, running under 5.5.
>> Currently investigating if this
>> > language=en_US=displayKC=1009996>
>> has anything to do (the resource overcommit bit).
>>
>
> Does this happen (only) while taking or consolidating snapshots? The VM is
> suspended during these operations and the OS isn't too crazy about it,
> especially if you have slow storage.
>

Nope, no snapshots. Just plain running. In fact, many times the guests are
under light usage (internal instrumentation, no external VMware stats).
We're investigating because we do have reasons to believe that our provider
is probably overcommitting or overselling (not out of malice, AFAIK).

HTH,
Carlos.
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] BUG: soft lockup - CPU#0 stuck for 36s! [swapper/0:0]

2016-08-18 Thread correomm
Yes, I tried it, but does not exists:

vmguest # cat /proc/sys/kernel/softlockup_thresh
cat: /proc/sys/kernel/softlockup_thresh: No such file or directory

On Thu, Aug 18, 2016 at 2:06 PM, Carlos A. Carnero Delgado <
carloscarn...@gmail.com> wrote:

> 2016-08-18 12:39 GMT-04:00 correomm :
>
> > This bug is reported only on the VM's with CentOS 7 running on on VMware
> > ESXi 5.1.
> > The vSphere performance graph shows high CPU consume and disk activity
> only
> > on VM's with CentOS 7. Sometimes I can not connect remotely with ssh
> > (timeout error).
> >
>
> I'm also seeing those errors in several servers, running under 5.5.
> Currently investigating if this
>  do?language=en_US=displayKC=1009996>
> has anything to do (the resource overcommit bit).
>
> HTH,
> Carlos.
> ___
> CentOS mailing list
> CentOS@centos.org
> https://lists.centos.org/mailman/listinfo/centos
>
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] BUG: soft lockup - CPU#0 stuck for 36s! [swapper/0:0]

2016-08-18 Thread JJB



2016-08-18 12:39 GMT-04:00 correomm :


This bug is reported only on the VM's with CentOS 7 running on on VMware
ESXi 5.1.
The vSphere performance graph shows high CPU consume and disk activity only
on VM's with CentOS 7. Sometimes I can not connect remotely with ssh
(timeout error).


I'm also seeing those errors in several servers, running under 5.5.
Currently investigating if this

has anything to do (the resource overcommit bit).


Does this happen (only) while taking or consolidating snapshots? The VM 
is suspended during these operations and the OS isn't too crazy about 
it, especially if you have slow storage.


Jack

___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] BUG: soft lockup - CPU#0 stuck for 36s! [swapper/0:0]

2016-08-18 Thread Carlos A. Carnero Delgado
2016-08-18 12:39 GMT-04:00 correomm :

> This bug is reported only on the VM's with CentOS 7 running on on VMware
> ESXi 5.1.
> The vSphere performance graph shows high CPU consume and disk activity only
> on VM's with CentOS 7. Sometimes I can not connect remotely with ssh
> (timeout error).
>

I'm also seeing those errors in several servers, running under 5.5.
Currently investigating if this

has anything to do (the resource overcommit bit).

HTH,
Carlos.
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] BUG: soft lockup - CPU#0 stuck for 36s! [swapper/0:0]

2016-08-18 Thread John R Pierce

On 8/18/2016 9:39 AM, correomm wrote:

This bug is reported only on the VM's with CentOS 7 running on on VMware
ESXi 5.1.
The vSphere performance graph shows high CPU consume and disk activity only
on VM's with CentOS 7. Sometimes I can not connect remotely with ssh
(timeout error).


FWIW, I've had no problems with CentOS 7.x VMs running in ESXi 5.5.0 GA 
(build 1331820)





--
john r pierce, recycling bits in santa cruz

___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos