Christop, do you have any links to the bug?
On Fri, Dec 21, 2018 at 11:07 AM Christoph Adomeit <
[email protected]> wrote:
> Hi,
>
> same here but also for pgs in cephfs pools.
>
> As far as I know there is a known bug that under memory pressure some
> reads return zero
> and this will lead to the error message.
>
> I have set nodeep-scrub and i am waiting for 12.2.11.
>
> Thanks
> Christoph
>
> On Fri, Dec 21, 2018 at 03:23:21PM +0100, Hervé Ballans wrote:
> > Hi Frank,
> >
> > I encounter exactly the same issue with the same disks than yours. Every
> > day, after a batch of deep scrubbing operation, ther are generally
> between 1
> > and 3 inconsistent pgs, and that, on different OSDs.
> >
> > It could confirm a problem on these disks, but :
> >
> > - it concerns only the pgs of the rbd pool, not those of cephfs pools
> (the
> > same disk model is used)
> >
> > - I encounter this when I was running 12.2.5, not when I upgraded in
> 12.2.8
> > but the problem appears again after upgrade in 12.2.10
> >
> > - On my side, smartctl and dmesg do not show any media error, so I'm
> pretty
> > sure that physical media is not concerned...
> >
> > Small precision: each disk is configured with RAID0 on a PERC740P, is
> this
> > also the case for you or are your disks in JBOD mode ?
> >
> > Another question: in your case, the OSD who is involved in the
> inconsistent
> > pgs is it always the same one or is it a new one every time ?
> >
> > For information, currently, the manually 'ceph pg repair' command works
> well
> > each time...
> >
> > Context: Luminous 12.2.10, Bluestore OSD with data block on SATA disks
> and
> > WAL/DB on NVMe, rbd configuration replica 3/2
> >
> > Cheers,
> > rv
> >
> > Few outputs:
> >
> > $ sudo ceph -s
> > cluster:
> > id: 838506b7-e0c6-4022-9e17-2d1cf9458be6
> > health: HEALTH_ERR
> > 3 scrub errors
> > Possible data damage: 3 pgs inconsistent
> >
> > services:
> > mon: 3 daemons, quorum inf-ceph-mon0,inf-ceph-mon1,inf-ceph-mon2
> > mgr: inf-ceph-mon0(active), standbys: inf-ceph-mon1, inf-ceph-mon2
> > mds: cephfs_home-2/2/2 up
> > {0=inf-ceph-mon1=up:active,1=inf-ceph-mon0=up:active}, 1 up:standby
> > osd: 126 osds: 126 up, 126 in
> >
> > data:
> > pools: 3 pools, 4224 pgs
> > objects: 23.35M objects, 20.9TiB
> > usage: 64.9TiB used, 136TiB / 201TiB avail
> > pgs: 4221 active+clean
> > 3 active+clean+inconsistent
> >
> > io:
> > client: 2.62KiB/s rd, 2.25MiB/s wr, 0op/s rd, 118op/s wr
> >
> > $ sudo ceph health detail
> > HEALTH_ERR 3 scrub errors; Possible data damage: 3 pgs inconsistent
> > OSD_SCRUB_ERRORS 3 scrub errors
> > PG_DAMAGED Possible data damage: 3 pgs inconsistent
> > pg 9.27 is active+clean+inconsistent, acting [78,107,96]
> > pg 9.260 is active+clean+inconsistent, acting [84,113,62]
> > pg 9.6b9 is active+clean+inconsistent, acting [79,107,80]
> > $ sudo rados list-inconsistent-obj 9.27 --format=json-prettyrados
> > list-inconsistent-obj 9.27 --format=json-pretty |grep error
> > "errors": [],
> > "union_shard_errors": [
> > "read_error"
> > "errors": [
> > "read_error"
> > "errors": [],
> > "errors": [],
> > $ sudo rados list-inconsistent-obj 9.260 --format=json-prettyrados
> > list-inconsistent-obj 9.260 --format=json-pretty |grep error
> > "errors": [],
> > "union_shard_errors": [
> > "read_error"
> > "errors": [],
> > "errors": [],
> > "errors": [
> > "read_error"
> > $ sudo rados list-inconsistent-obj 9.6b9 --format=json-prettyrados
> > list-inconsistent-obj 9.6b9 --format=json-pretty |grep error
> > "errors": [],
> > "union_shard_errors": [
> > "read_error"
> > "errors": [
> > "read_error"
> > "errors": [],
> > "errors": [],
> > $ sudo ceph pg repair 9.27
> > instructing pg 9.27 on osd.78 to repair
> > $ sudo ceph pg repair 9.260
> > instructing pg 9.260 on osd.84 to repair
> > $ sudo ceph pg repair 9.6b9
> > instructing pg 9.6b9 on osd.79 to repair
> > $ sudo ceph -s
> > cluster:
> > id: 838506b7-e0c6-4022-9e17-2d1cf9458be6
> > health: HEALTH_OK
> >
> > services:
> > mon: 3 daemons, quorum inf-ceph-mon0,inf-ceph-mon1,inf-ceph-mon2
> > mgr: inf-ceph-mon0(active), standbys: inf-ceph-mon1, inf-ceph-mon2
> > mds: cephfs_home-2/2/2 up
> > {0=inf-ceph-mon1=up:active,1=inf-ceph-mon0=up:active}, 1 up:standby
> > osd: 126 osds: 126 up, 126 in
> >
> > data:
> > pools: 3 pools, 4224 pgs
> > objects: 23.35M objects, 20.9TiB
> > usage: 64.9TiB used, 136TiB / 201TiB avail
> > pgs: 4224 active+clean
> >
> > io:
> > client: 195KiB/s rd, 7.19MiB/s wr, 17op/s rd, 127op/s wr
> >
> >
> >
> > Le 19/12/2018 à 04:48, Frank Ritchie a écrit :
> > >Hi all,
> > >
> > >I have been receiving alerts for:
> > >
> > >Possible data damage: 1 pg inconsistent
> > >
> > >almost daily for a few weeks now. When I check:
> > >
> > >rados list-inconsistent-obj $PG --format=json-pretty
> > >
> > >I will always see a read_error. When I run a deep scrub on the PG I will
> > >see:
> > >
> > >head candidate had a read error
> > >
> > >When I check dmesg on the osd node I see:
> > >
> > >blk_update_request: critical medium error, dev sdX, sector 123
> > >
> > >I will also see a few uncorrected read errors in smartctl.
> > >
> > >Info:
> > >Ceph: ceph version 12.2.4-30.el7cp
> > >OSD: Toshiba 1.8TB SAS 10K
> > >120 OSDs total
> > >
> > >Has anyone else seen these alerts occur almost daily? Can the errors
> > >possibly be due to deep scrubbing too aggressively?
> > >
> > >I realize these errors indicate potential failing drives but I can't
> > >replace a drive daily.
> > >
> > >thx
> > >Frank
> >
> >
>
> > _______________________________________________
> > ceph-users mailing list
> > [email protected]
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> --
> Kein Backup - kein Mitleid
> Christoph Adomeit
> GATWORKS GmbH
> Reststrauch 191
> 41199 Moenchengladbach
> Sitz: Moenchengladbach
> Amtsgericht Moenchengladbach, HRB 6303
> Geschaeftsfuehrer:
> Christoph Adomeit, Hans Wilhelm Terstappen
>
> [email protected] Internetloesungen vom Feinsten
> Fon. +49 2166 9149-32 Fax. +49 2166 9149-10
>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com