Re: [ceph-users] Instance filesystem corrupt

Brian :: Thu, 27 Oct 2016 19:43:51 -0700

What is the issue exactly?


On Fri, Oct 28, 2016 at 2:47 AM, <[email protected]> wrote:

> I think this issue may not related to your poor hardware.
>
>
>
> Our cluster has 3 Ceph monitor and 4 OSD.
>
>
>
> Each server has
>
> 2 cpu ( Intel(R) Xeon(R) CPU E5-2683 v3 @ 2.00GHz ) , 32 GB memory
>
> OSD nodes has 2 SSD for journal disks  and 8 SATA disks ( 6TB / 7200 rpm )
>
> ALL of them were connected to each other by 4 x 10Gbps cable ( 802.3 ad )
>
>
>
> The utilization of our Cpeh is only 13% , most of time the IOPS was kept
> under 1500.
>
>
>
> We still getting this issue…..
>
>
>
>
>
> *From:* Ahmed Mostafa [mailto:[email protected]]
> *Sent:* Friday, October 28, 2016 6:30 AM
> *To:* [email protected]
> *Cc:* Keynes Lee/WHQ/Wistron <[email protected]>;
> [email protected]
> *Subject:* Re: [ceph-users] Instance filesystem corrupt
>
>
>
> So i couldn't actually wait till the morning
>
>
>
> I sat rbd cache to false and tried to create the same number of instances,
> but the same issue happened again.
>
>
>
> I want to note, that if i rebooted any of the virtual machines that has
> this issue, it works without any problem afterwards.
>
>
>
> Does this mean that over-utilization could be the cause of my problem ?
> The cluster i have have bad hardware and this is the only logical
> explanation i can reach .
>
>
>
> By bad hardware i mean core-i5 processors for instance, i can see the %wa
> reaching 50-60% too.
>
>
>
> Thank you
>
>
>
>
>
> On Thu, Oct 27, 2016 at 4:13 PM, Jason Dillaman <[email protected]>
> wrote:
>
> The only effect I could see out of a highly overloaded system would be
> that the OSDs might appear to become unresponsive to the VMs. Are any of
> you using cache tiering or librbd cache? For the latter, there was one
> issue [1] that can result in read corruption that affects hammer and prior
> releases.
>
>
>
> [1] http://tracker.ceph.com/issues/16002
>
>
>
> On Thu, Oct 27, 2016 at 1:34 AM, Ahmed Mostafa <[email protected]>
> wrote:
>
> This is more or less the same bahaviour i have in ky environment
>
>
>
> By any chance is anyone running their osds and their hypervisors on the
> same machine ?
>
>
>
> And could high workload, like starting 40 - 60 or above virtual machines
> have an effect on this problem ?
>
>
>
>
> On Thursday, 27 October 2016, <[email protected]> wrote:
>
>
>
> Most of filesystem corrupt causes instances crashed, we saw that after a
> shutdown / restart
>
> ( triggered by OpenStack portal  buttons or triggered by OS commands in
> Instances )
>
>
>
> Some are early-detected, we saw filesystem errors in OS logs on instances.
>
> Then we make a filesystem check ( FSCK / chkdsk ) immediately, issue fixed.
>
>
>
> [image: cid:[email protected]]
>
> *Keynes  Lee    **李* *俊* *賢*
>
> Direct:
>
> +886-2-6612-1025
>
> Mobile:
>
> +886-9-1882-3787
>
> Fax:
>
> +886-2-6612-1991
>
>
>
> E-Mail:
>
> [email protected]
>
>
>
>
>
> *From:* Jason Dillaman [mailto:[email protected] <[email protected]>]
> *Sent:* Wednesday, October 26, 2016 9:38 PM
> *To:* Keynes Lee/WHQ/Wistron <[email protected]>
> *Cc:* [email protected]; ceph-users <[email protected]>
> *Subject:* Re: [ceph-users] [EXTERNAL] Instance filesystem corrupt
>
>
>
> I am not aware of any similar reports against librbd on Firefly. Do you
> use any configuration overrides? Does the filesystem corruption appears
> while the instances are running or only after a shutdown / restart of the
> instance?
>
>
>
> On Wed, Oct 26, 2016 at 12:46 AM, <[email protected]> wrote:
>
> No , we are using Firefly (0.80.7).
>
> As we are using HPE Helion OpenStack 2.1.5, and what the version is was
> embedded is Firefly.
>
>
>
> An upgrade was planning, but should will not happen  soon.
>
>
>
>
>
>
>
>
>
>
>
> *From:* Will.Boege [mailto:[email protected] <[email protected]>]
> *Sent:* Wednesday, October 26, 2016 12:03 PM
> *To:* Keynes Lee/WHQ/Wistron <[email protected]>;
> [email protected]
> *Subject:* Re: [EXTERNAL] [ceph-users] Instance filesystem corrupt
>
>
>
> Just out of curiosity, did you recently upgrade to Jewel?
>
>
>
> *From: *ceph-users <[email protected]> on behalf of "
> [email protected]" <[email protected]>
> *Date: *Tuesday, October 25, 2016 at 10:52 PM
> *To: *"[email protected]" <[email protected]>
> *Subject: *[EXTERNAL] [ceph-users] Instance filesystem corrupt
>
>
>
> We are using OpenStack + Ceph.
>
> Recently we found a lot of filesystem corrupt incident on instances.
>
> Some of them are correctable, fixed by fsck, but the others have no luck,
> just corrupt and can never start up again.
>
>
>
> We found this issue on vary operation systems of instances. They are
>
> Redhat4 / CentOS 7 / Windows 2012
>
>
>
> Could someone please advise us some troubleshooting direction ?
>
>
>
>
>
> [image: id:[email protected]]
>
> *Keynes  Lee    **李* *俊* *賢*
>
> Direct:
>
> +886-2-6612-1025
>
> Mobile:
>
> +886-9-1882-3787
>
> Fax:
>
> +886-2-6612-1991
>
>
>
> E-Mail:
>
> [email protected]
>
>
>
>
>
>
> *---------------------------------------------------------------------------------------------------------------------------------------------------------------*
>
> *This email contains confidential or legally privileged information and is
> for the sole use of its intended recipient. *
>
> *Any unauthorized review, use, copying or distribution of this email or
> the content of this email is strictly prohibited.*
>
> *If you are not the intended recipient, you may reply to the sender and
> should delete this e-mail immediately.*
>
>
> *---------------------------------------------------------------------------------------------------------------------------------------------------------------*
>
>
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
>
>
> --
>
> Jason
>
>
>
>
>
> --
>
> Jason
>
>
>
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Instance filesystem corrupt

Reply via email to