Hi Sean,
Many thanks for the suggestion, but unfortunately deep-scrub also
appears to be ignored:
# ceph pg deep-scrub 4.ff
instructing pg 4.ffs0 on osd.318 to deep-scrub
'tail -f ceph-osd.318.log' shows no new entries.
To get more info, I set debug level 10 on the osd, and issued another
repair command:
# ceph daemon osd.318 config set debug_osd 10
# ceph pg repair 4.ff
instructing pg 4.ffs0 on osd.318 to repair
Tailing OSD log, showed what might be an appropriate response:
2018-07-04 13:54:44.181 7faaaeaa8700 10 osd.318 pg_epoch: 180138
pg[4.ffs0( v 180138'5043225 (180078'5040201,180138'5043225]
local-lis/les=179843/179844 n=124423 ec=735/735 lis/c 179843/179843
les/c/f 179844/180011/0 179841/179843/174426)
[318,403,150,13,225,261,382,175,282,324]p318(0) r=0 lpr=179843
crt=180138'5043225 lcod 180138'5043224 mlcod 180138'5043224
active+clean+inconsistent MUST_REPAIR MUST_DEEP_SCRUB MUST_SCRUB ps=926]
state<Started/Primary>: marking for scrub
However, the scrub still doesn't start...
# ceph pg 4.ff query
shows .....
"last_deep_scrub_stamp": "2018-07-01 18:00:41.769956",
"last_clean_scrub_stamp": "2018-06-27 05:55:13.023760",
"num_scrub_errors": 23,
"num_shallow_scrub_errors": 0,
"num_deep_scrub_errors": 23,
"scrub": {
"scrubber.epoch_start": "178857",
"scrubber.active": false,
"scrubber.state": "INACTIVE",
"scrubber.start": "MIN",
"scrubber.end": "MIN",
"scrubber.max_end": "MIN",
"scrubber.subset_last_update": "0'0",
"scrubber.deep": false,
"scrubber.waiting_on_whom": []
Not sure where to go from here :(
Jake
On 04/07/18 01:14, Sean Redmond wrote:
> do a deep-scrub instead of just a scrub
>
> On Tue, 3 Jul 2018, 12:37 Jake Grimmett, <[email protected]
> <mailto:[email protected]>> wrote:
>
> Dear All,
>
> Sorry to bump the thread, but I still can't manually repair inconsistent
> pgs on our Mimic cluster (13.2.0, upgraded from 12.2.5)
>
> There are many similarities to an unresolved bug:
>
> http://tracker.ceph.com/issues/15781
>
> To give more examples of the problem:
>
> The following commands appear to run OK, but *nothing* appears in the
> osd log to indicate that the commands are running. The OSD's are
> otherwise working & logging OK.
>
> # ceph pg scrub 4.e19
> instructing pg 4.e19s0 on osd.246 to scrub
>
> # ceph pg repair 4.e19
> instructing pg 4.e19s0 on osd.246 to repair
>
> # ceph osd scrub 246
> instructed osd(s) 246 to scrub
>
> # ceph osd repair 246
> instructed osd(s) 246 to repair
>
> It does not matter which osd or pg the repair is initiated on.
>
> This command also fails:
> # rados list-inconsistent-obj 4.e19
> No scrub information available for pg 4.e19
> error 2: (2) No such file or directory
>
> >From the OSD logs, and 'ceph -s' I can see that the OSD's are still
> doing automatic background pg scrubs, just not the ones I have asked
> them to do, at the time of my request they are not currently scrubbing.
>
> Could it be that my commands are not being sent to the OSD's?
>
> Any idea on how to debug this?
>
> ...
>
> Further info:
>
> Output of 'ceph pg 4.e19 query' is here:
> http://p.ip.fi/9x5v
>
> Output of 'ceph daemon osd.246 config show' is here
> http://p.ip.fi/RAuk
>
> Cluster has 10 nodes, 128GB RAM, dual Xeon
> 450 Bluestore SATA OSD, EC 8:2
> 4 NVME OSD, replicated
> used for cephfs (2.3PB), daily snapshots only
>
> # ceph health detail
> HEALTH_ERR 9500031/5149746146 objects misplaced (0.184%); 80 scrub
> errors; Possible data damage: 7 pgs inconsistent
> OBJECT_MISPLACED 9500031/5149746146 objects misplaced (0.184%)
> OSD_SCRUB_ERRORS 80 scrub errors
> PG_DAMAGED Possible data damage: 7 pgs inconsistent
> pg 4.ff is active+clean+inconsistent, acting
> [318,403,150,13,225,261,382,175,282,324]
> pg 4.2e2 is active+clean+inconsistent, acting
> [352,59,328,451,195,119,42,66,158,150]
> pg 4.551 is active+clean+inconsistent, acting
> [391,105,124,150,205,22,269,184,293,91]
> pg 4.61c is active+clean+inconsistent, acting
> [382,131,84,35,282,214,236,366,309,150]
> pg 4.8cd is active+clean+inconsistent, acting
> [353,58,5,252,187,183,323,150,387,32]
> pg 4.a20 is active+clean+inconsistent, acting
> [346,104,398,282,225,133,150,70,165,17]
> pg 4.e19 is active+clean+inconsistent, acting
> [246,447,245,98,170,348,111,155,150,295]
>
> again, thanks for any advice,
>
> Jake
> _______________________________________________
> ceph-users mailing list
> [email protected] <mailto:[email protected]>
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
--
Dr Jake Grimmett
Head Of Scientific Computing
MRC Laboratory of Molecular Biology
Francis Crick Avenue,
Cambridge CB2 0QH, UK.
Phone 01223 267019
Mobile 0776 9886539
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com