On 04/11/2014 02:45 PM, Greg Poirier wrote:
So... our storage problems persisted for about 45 minutes. I gave an
entire hypervisor worth of VM's time to recover (approx. 30 vms), and
none of them recovered on their own. In the end, we had to stop and
start every VM (easily done, it was just alarming). Once rebooted, the
VMs of course were fine.


So that's interesting. I'm going to try this myself as well since I think they should continue I/O at some point.

I marked the two full OSDs as down and out. I am a little concerned that
these two are full while the cluster, in general, is only at 50%
capacity. It appears we may have a hot spot. I'm going to look into that
later today. Also, I'm not sure how it happened, but pgp_num is lower
than pg_num.  I had not noticed that until last night. Will address that
as well. This probably happened when i last resized placement groups or
potentially when I setup object storage pools.




On Fri, Apr 11, 2014 at 3:49 AM, Wido den Hollander <w...@42on.com
<mailto:w...@42on.com>> wrote:

    On 04/11/2014 09:23 AM, Josef Johansson wrote:


        On 11/04/14 09:07, Wido den Hollander wrote:


                Op 11 april 2014 om 8:50 schreef Josef Johansson
                <jo...@oderland.se <mailto:jo...@oderland.se>>:


                Hi,

                On 11/04/14 07:29, Wido den Hollander wrote:

                        Op 11 april 2014 om 7:13 schreef Greg Poirier
                        <greg.poir...@opower.com
                        <mailto:greg.poir...@opower.com>>:


                        One thing to note....
                        All of our kvm VMs have to be rebooted. This is
                        something I wasn't
                        expecting.  Tried waiting for them to recover on
                        their own, but that's not
                        happening. Rebooting them restores service
                        immediately. :/ Not ideal.

                    A reboot isn't really required though. It could be
                    that the VM itself is in
                    trouble, but from a librados/librbd perspective I/O
                    should simply continue
                    as
                    soon as a osdmap has been received without the
                    "full" flag.

                    It could be that you have to wait some time before
                    the VM continues. This
                    can
                    take up to 15 minutes.

                With other storage solution you would have to change the
                timeout-value
                for each disk, i.e. changing to 180 secs from 60 secs,
                for the VMs to
                survive storage problems.
                Does Ceph handle this differently somehow?

            It's not that RBD does it differently. Librados simply
            blocks the I/O and thus
            dus librbd which then causes Qemu to block.

            I've seen VMs survive RBD issues for longer periods then 60
            seconds. Gave them
            some time and they continued again.

            Which exact setting are you talking about? I'm talking about
            a Qemu/KVM VM
            running with a VirtIO drive.

        cat /sys/block/*/device/timeout
        
(http://kb.vmware.com/__selfservice/microsites/search.__do?language=en_US&cmd=__displayKC&externalId=1009465
        
<http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1009465>)

        This file is non-existant for my Ceph-VirtIO-drive however, so
        it seems
        RBD handles this.


    Well, I don't think it's handled by RBD, but VirtIO simply doesn't
    have the timeout. That's probably only in the SCSI driver.

    Wido


        I have just Para-Virtualized VMs to compare with right now, and they
        don't have it inside the VM, but that's expected. From my
        understanding
        it should've been there if it was a HVM. Whenever the timeout was
        reached, an error occured and the disk was set in read-only-mode.

        Cheers,
        Josef

            Wido

                Cheers,
                Josef

                    Wido

                        On Thu, Apr 10, 2014 at 10:12 PM, Greg Poirier
                        <greg.poir...@opower.com
                        <mailto:greg.poir...@opower.com>>__wrote:

                            Going to try increasing the full ratio. Disk
                            utilization wasn't really
                            growing at an unreasonable pace. I'm going
                            to keep an eye on it for the
                            next couple of hours and down/out the OSDs
                            if necessary.

                            We have four more machines that we're in the
                            process of adding (which
                            doubles the number of OSDs), but got held up
                            by some networking nonsense.

                            Thanks for the tips.


                            On Thu, Apr 10, 2014 at 9:51 PM, Sage Weil
                            <s...@inktank.com <mailto:s...@inktank.com>>
                            wrote:

                                On Thu, 10 Apr 2014, Greg Poirier wrote:

                                    Hi,
                                    I have about 200 VMs with a common
                                    RBD volume as their root filesystem

                                and a

                                    number of additional filesystems on
                                    Ceph.

                                    All of them have stopped responding.
                                    One of the OSDs in my cluster is

                                marked

                                    full. I tried stopping that OSD to
                                    force things to rebalance or at

                                least go

                                    to degraded mode, but nothing is
                                    responding still.

                                    I'm not exactly sure what to do or
                                    how to investigate. Suggestions?

                                Try marking the osd out or partially out
                                (ceph osd reweight N .9) to move
                                some data off, and/or adjust the full
                                ratio up (ceph pg set_full_ratio
                                .95).  Note that this becomes
                                increasinly dangerous as OSDs get closer to
                                full; add some disks.

                                sage


                        _________________________________________________
                        ceph-users mailing list
                        ceph-users@lists.ceph.com
                        <mailto:ceph-users@lists.ceph.com>
                        
http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com
                        <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>

                    _________________________________________________
                    ceph-users mailing list
                    ceph-users@lists.ceph.com
                    <mailto:ceph-users@lists.ceph.com>
                    http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com
                    <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>

                _________________________________________________
                ceph-users mailing list
                ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
                http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com
                <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>




    --
    Wido den Hollander
    42on B.V.
    Ceph trainer and consultant

    Phone: +31 (0)20 700 9902 <tel:%2B31%20%280%2920%20700%209902>
    Skype: contact42on

    _________________________________________________
    ceph-users mailing list
    ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
    http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com
    <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>




--
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to