Re: [ceph-users] Instance filesystem corrupt

Ahmed Mostafa Thu, 27 Oct 2016 15:30:13 -0700

So i couldn't actually wait till the morning

I sat rbd cache to false and tried to create the same number of instances,
but the same issue happened again.


I want to note, that if i rebooted any of the virtual machines that has
this issue, it works without any problem afterwards.

Does this mean that over-utilization could be the cause of my problem ? The
cluster i have have bad hardware and this is the only logical explanation i
can reach .

By bad hardware i mean core-i5 processors for instance, i can see the %wa
reaching 50-60% too.

Thank you


On Thu, Oct 27, 2016 at 4:13 PM, Jason Dillaman <[email protected]> wrote:

> The only effect I could see out of a highly overloaded system would be
> that the OSDs might appear to become unresponsive to the VMs. Are any of
> you using cache tiering or librbd cache? For the latter, there was one
> issue [1] that can result in read corruption that affects hammer and prior
> releases.
>
> [1] http://tracker.ceph.com/issues/16002
>
> On Thu, Oct 27, 2016 at 1:34 AM, Ahmed Mostafa <[email protected]>
> wrote:
>
>> This is more or less the same bahaviour i have in ky environment
>>
>> By any chance is anyone running their osds and their hypervisors on the
>> same machine ?
>>
>> And could high workload, like starting 40 - 60 or above virtual machines
>> have an effect on this problem ?
>>
>>
>> On Thursday, 27 October 2016, <[email protected]> wrote:
>>
>>>
>>>
>>> Most of filesystem corrupt causes instances crashed, we saw that after a
>>> shutdown / restart
>>>
>>> ( triggered by OpenStack portal  buttons or triggered by OS commands in
>>> Instances )
>>>
>>>
>>>
>>> Some are early-detected, we saw filesystem errors in OS logs on
>>> instances.
>>>
>>> Then we make a filesystem check ( FSCK / chkdsk ) immediately, issue
>>> fixed.
>>>
>>>
>>>
>>> [image: cid:[email protected]]
>>>
>>> *Keynes  Lee    **李* *俊* *賢*
>>>
>>> Direct:
>>>
>>> +886-2-6612-1025
>>>
>>> Mobile:
>>>
>>> +886-9-1882-3787
>>>
>>> Fax:
>>>
>>> +886-2-6612-1991
>>>
>>>
>>>
>>> E-Mail:
>>>
>>> [email protected]
>>>
>>>
>>>
>>>
>>>
>>> *From:* Jason Dillaman [mailto:[email protected]]
>>> *Sent:* Wednesday, October 26, 2016 9:38 PM
>>> *To:* Keynes Lee/WHQ/Wistron <[email protected]>
>>> *Cc:* [email protected]; ceph-users <[email protected]>
>>> *Subject:* Re: [ceph-users] [EXTERNAL] Instance filesystem corrupt
>>>
>>>
>>>
>>> I am not aware of any similar reports against librbd on Firefly. Do you
>>> use any configuration overrides? Does the filesystem corruption appears
>>> while the instances are running or only after a shutdown / restart of the
>>> instance?
>>>
>>>
>>>
>>> On Wed, Oct 26, 2016 at 12:46 AM, <[email protected]> wrote:
>>>
>>> No , we are using Firefly (0.80.7).
>>>
>>> As we are using HPE Helion OpenStack 2.1.5, and what the version is was
>>> embedded is Firefly.
>>>
>>>
>>>
>>> An upgrade was planning, but should will not happen  soon.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *From:* Will.Boege [mailto:[email protected]]
>>> *Sent:* Wednesday, October 26, 2016 12:03 PM
>>> *To:* Keynes Lee/WHQ/Wistron <[email protected]>;
>>> [email protected]
>>> *Subject:* Re: [EXTERNAL] [ceph-users] Instance filesystem corrupt
>>>
>>>
>>>
>>> Just out of curiosity, did you recently upgrade to Jewel?
>>>
>>>
>>>
>>> *From: *ceph-users <[email protected]> on behalf of "
>>> [email protected]" <[email protected]>
>>> *Date: *Tuesday, October 25, 2016 at 10:52 PM
>>> *To: *"[email protected]" <[email protected]>
>>> *Subject: *[EXTERNAL] [ceph-users] Instance filesystem corrupt
>>>
>>>
>>>
>>> We are using OpenStack + Ceph.
>>>
>>> Recently we found a lot of filesystem corrupt incident on instances.
>>>
>>> Some of them are correctable, fixed by fsck, but the others have no
>>> luck, just corrupt and can never start up again.
>>>
>>>
>>>
>>> We found this issue on vary operation systems of instances. They are
>>>
>>> Redhat4 / CentOS 7 / Windows 2012
>>>
>>>
>>>
>>> Could someone please advise us some troubleshooting direction ?
>>>
>>>
>>>
>>>
>>>
>>> [image: id:[email protected]]
>>>
>>> *Keynes  Lee    **李* *俊* *賢*
>>>
>>> Direct:
>>>
>>> +886-2-6612-1025
>>>
>>> Mobile:
>>>
>>> +886-9-1882-3787
>>>
>>> Fax:
>>>
>>> +886-2-6612-1991
>>>
>>>
>>>
>>> E-Mail:
>>>
>>> [email protected]
>>>
>>>
>>>
>>>
>>>
>>>
>>> *---------------------------------------------------------------------------------------------------------------------------------------------------------------*
>>>
>>> *This email contains confidential or legally privileged information and
>>> is for the sole use of its intended recipient. *
>>>
>>> *Any unauthorized review, use, copying or distribution of this email or
>>> the content of this email is strictly prohibited.*
>>>
>>> *If you are not the intended recipient, you may reply to the sender and
>>> should delete this e-mail immediately.*
>>>
>>>
>>> *---------------------------------------------------------------------------------------------------------------------------------------------------------------*
>>>
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> [email protected]
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Jason
>>>
>>
>
>
> --
> Jason
>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Instance filesystem corrupt

Reply via email to