How up to date is your VM environment? We saw something very similar last year 
with Linux VM’s running newish kernels. It turns out newer kernels supported a 
new feature of the vmxnet3 adapters which had a bug in ESXi. The fix was 
release last year some time in ESXi6.5 U1, or a workaround was to set an option 
in the VM config.

 

https://kb.vmware.com/s/article/2151480

 

 

 

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Youzhong Yang
Sent: 21 January 2018 19:50
To: Brad Hubbard <bhubb...@redhat.com>
Cc: ceph-users <ceph-users@lists.ceph.com>
Subject: Re: [ceph-users] Ubuntu 17.10 or Debian 9.3 + Luminous = random OS 
hang ?

 

As someone suggested, I installed linux-generic-hwe-16.04 package on Ubuntu 
16.04 to get kernel of 17.10, and then rebooted all VMs, here is what I 
observed:

- ceph monitor node froze upon reboot, in another case froze after a few 
minutes 

- ceph OSD hosts easily froze

- ceph admin node (which runs no ceph service but ceph-deploy) never freezes

- ceph rgw nodes and ceph mgr so far so good

 

Here are two images I captured:

 

https://drive.google.com/file/d/11hMJwhCF6Tj8LD3nlpokG0CB_oZqI506/view?usp=sharing

https://drive.google.com/file/d/1tzDQ3DYTnfDHh_hTQb0ISZZ4WZdRxHLv/view?usp=sharing

 

Thanks.

 

On Sat, Jan 20, 2018 at 7:03 PM, Brad Hubbard <bhubb...@redhat.com 
<mailto:bhubb...@redhat.com> > wrote:

On Fri, Jan 19, 2018 at 11:54 PM, Youzhong Yang <youzh...@gmail.com 
<mailto:youzh...@gmail.com> > wrote:
> I don't think it's hardware issue. All the hosts are VMs. By the way, using
> the same set of VMWare hypervisors, I switched back to Ubuntu 16.04 last
> night, so far so good, no freeze.

Too little information to make any sort of assessment I'm afraid but,
at this stage, this doesn't sound like a ceph issue.


>
> On Fri, Jan 19, 2018 at 8:50 AM, Daniel Baumann <daniel.baum...@bfh.ch 
> <mailto:daniel.baum...@bfh.ch> >
> wrote:
>>
>> Hi,
>>
>> On 01/19/18 14:46, Youzhong Yang wrote:
>> > Just wondering if anyone has seen the same issue, or it's just me.
>>
>> we're using debian with our own backported kernels and ceph, works rock
>> solid.
>>
>> what you're describing sounds more like hardware issues to me. if you
>> don't fully "trust"/have confidence in your hardware (and your logs
>> don't reveal anything), I'd recommend running some burn-in tests
>> (memtest, cpuburn, etc.) on them for 24 hours/machine to rule out
>> cpu/ram/etc. issues.
>>
>> Regards,
>> Daniel
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> 
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>




--
Cheers,
Brad

 

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to