Re: [ceph-users] Instance filesystem corrupt

Ahmed Mostafa Thu, 27 Oct 2016 16:36:53 -0700

Well, for me know i know my issue

It is indeed an over-utilization issue, but bot related to ceph


Th cluster is connected via a 1g interfaces, which basixally gets saturated
by all the bandwidth generated from theae instances trying to read their
root filesystems and mount it, which is stored in ceph

So that explains why restarting the vm later solvs the problem.



On Friday, 28 October 2016, Ahmed Mostafa <[email protected]> wrote:

> So i couldn't actually wait till the morning
>
> I sat rbd cache to false and tried to create the same number of instances,
> but the same issue happened again.
>
> I want to note, that if i rebooted any of the virtual machines that has
> this issue, it works without any problem afterwards.
>
> Does this mean that over-utilization could be the cause of my problem ?
> The cluster i have have bad hardware and this is the only logical
> explanation i can reach .
>
> By bad hardware i mean core-i5 processors for instance, i can see the %wa
> reaching 50-60% too.
>
> Thank you
>
>
> On Thu, Oct 27, 2016 at 4:13 PM, Jason Dillaman <[email protected]
> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
>
>> The only effect I could see out of a highly overloaded system would be
>> that the OSDs might appear to become unresponsive to the VMs. Are any of
>> you using cache tiering or librbd cache? For the latter, there was one
>> issue [1] that can result in read corruption that affects hammer and prior
>> releases.
>>
>> [1] http://tracker.ceph.com/issues/16002
>>
>> On Thu, Oct 27, 2016 at 1:34 AM, Ahmed Mostafa <[email protected]
>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
>>
>>> This is more or less the same bahaviour i have in ky environment
>>>
>>> By any chance is anyone running their osds and their hypervisors on the
>>> same machine ?
>>>
>>> And could high workload, like starting 40 - 60 or above virtual machines
>>> have an effect on this problem ?
>>>
>>>
>>> On Thursday, 27 October 2016, <[email protected]
>>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
>>>
>>>>
>>>>
>>>> Most of filesystem corrupt causes instances crashed, we saw that after
>>>> a shutdown / restart
>>>>
>>>> ( triggered by OpenStack portal  buttons or triggered by OS commands in
>>>> Instances )
>>>>
>>>>
>>>>
>>>> Some are early-detected, we saw filesystem errors in OS logs on
>>>> instances.
>>>>
>>>> Then we make a filesystem check ( FSCK / chkdsk ) immediately, issue
>>>> fixed.
>>>>
>>>>
>>>>
>>>> [image: cid:[email protected]]
>>>>
>>>> *Keynes  Lee    **李* *俊* *賢*
>>>>
>>>> Direct:
>>>>
>>>> +886-2-6612-1025
>>>>
>>>> Mobile:
>>>>
>>>> +886-9-1882-3787
>>>>
>>>> Fax:
>>>>
>>>> +886-2-6612-1991
>>>>
>>>>
>>>>
>>>> E-Mail:
>>>>
>>>> [email protected]
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *From:* Jason Dillaman [mailto:[email protected]]
>>>> *Sent:* Wednesday, October 26, 2016 9:38 PM
>>>> *To:* Keynes Lee/WHQ/Wistron <[email protected]>
>>>> *Cc:* [email protected]; ceph-users <[email protected]>
>>>> *Subject:* Re: [ceph-users] [EXTERNAL] Instance filesystem corrupt
>>>>
>>>>
>>>>
>>>> I am not aware of any similar reports against librbd on Firefly. Do you
>>>> use any configuration overrides? Does the filesystem corruption appears
>>>> while the instances are running or only after a shutdown / restart of the
>>>> instance?
>>>>
>>>>
>>>>
>>>> On Wed, Oct 26, 2016 at 12:46 AM, <[email protected]> wrote:
>>>>
>>>> No , we are using Firefly (0.80.7).
>>>>
>>>> As we are using HPE Helion OpenStack 2.1.5, and what the version is was
>>>> embedded is Firefly.
>>>>
>>>>
>>>>
>>>> An upgrade was planning, but should will not happen  soon.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *From:* Will.Boege [mailto:[email protected]]
>>>> *Sent:* Wednesday, October 26, 2016 12:03 PM
>>>> *To:* Keynes Lee/WHQ/Wistron <[email protected]>;
>>>> [email protected]
>>>> *Subject:* Re: [EXTERNAL] [ceph-users] Instance filesystem corrupt
>>>>
>>>>
>>>>
>>>> Just out of curiosity, did you recently upgrade to Jewel?
>>>>
>>>>
>>>>
>>>> *From: *ceph-users <[email protected]> on behalf of "
>>>> [email protected]" <[email protected]>
>>>> *Date: *Tuesday, October 25, 2016 at 10:52 PM
>>>> *To: *"[email protected]" <[email protected]>
>>>> *Subject: *[EXTERNAL] [ceph-users] Instance filesystem corrupt
>>>>
>>>>
>>>>
>>>> We are using OpenStack + Ceph.
>>>>
>>>> Recently we found a lot of filesystem corrupt incident on instances.
>>>>
>>>> Some of them are correctable, fixed by fsck, but the others have no
>>>> luck, just corrupt and can never start up again.
>>>>
>>>>
>>>>
>>>> We found this issue on vary operation systems of instances. They are
>>>>
>>>> Redhat4 / CentOS 7 / Windows 2012
>>>>
>>>>
>>>>
>>>> Could someone please advise us some troubleshooting direction ?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> [image: id:[email protected]]
>>>>
>>>> *Keynes  Lee    **李* *俊* *賢*
>>>>
>>>> Direct:
>>>>
>>>> +886-2-6612-1025
>>>>
>>>> Mobile:
>>>>
>>>> +886-9-1882-3787
>>>>
>>>> Fax:
>>>>
>>>> +886-2-6612-1991
>>>>
>>>>
>>>>
>>>> E-Mail:
>>>>
>>>> [email protected]
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *---------------------------------------------------------------------------------------------------------------------------------------------------------------*
>>>>
>>>> *This email contains confidential or legally privileged information and
>>>> is for the sole use of its intended recipient. *
>>>>
>>>> *Any unauthorized review, use, copying or distribution of this email or
>>>> the content of this email is strictly prohibited.*
>>>>
>>>> *If you are not the intended recipient, you may reply to the sender and
>>>> should delete this e-mail immediately.*
>>>>
>>>>
>>>> *---------------------------------------------------------------------------------------------------------------------------------------------------------------*
>>>>
>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> [email protected]
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Jason
>>>>
>>>
>>
>>
>> --
>> Jason
>>
>
>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Instance filesystem corrupt

Reply via email to