We are testing a ceph cluster mostly using cephfs. We are using an
erasure-code pool, and have been loading it up with data. Recently, we got
a HEALTH_ERR response when we were querying the ceph status. We stopped all
activity to the filesystem, and waited to see if the error would go away.
It didn't. Then we tried a couple of suggestions from the internet (ceph pg
repair, ceph pg scrub, ceph pg deep-scrub) to no avail. I'm not sure how to
find out more information about what the problem is, and how to repair the
filesystem to bring it back to normal health. Any suggestions?

Current status:

# ceph -s

  cluster:

    id:     28ef32f1-4350-491b-9003-b19b9c3a2076

    health: HEALTH_ERR

            5 scrub errors

            Possible data damage: 1 pg inconsistent



  services:

    mon: 3 daemons, quorum gi-cba-01,gi-cba-02,gi-cba-03

    mgr: gi-cba-01(active), standbys: gi-cba-02, gi-cba-03

    mds: backups-1/1/1 up  {0=gi-cbmd=up:active}

    osd: 87 osds: 87 up, 87 in



  data:

    pools:   2 pools, 4096 pgs

    objects: 90.98 M objects, 134 TiB

    usage:   210 TiB used, 845 TiB / 1.0 PiB avail

    pgs:     4088 active+clean

             5    active+clean+scrubbing+deep

             2    active+clean+scrubbing

             1    active+clean+inconsistent

# ceph health detail

HEALTH_ERR 5 scrub errors; Possible data damage: 1 pg inconsistent

OSD_SCRUB_ERRORS 5 scrub errors

PG_DAMAGED Possible data damage: 1 pg inconsistent

    pg 2.798 is active+clean+inconsistent, acting [41,50,17,2,86,70,61]
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to