Hello, Irek.

Look at this please:

root@ceph-osd-1-2:~# rbd -p rbd ls
rbd: pool rbd doesn't contain rbd images

root@ceph-osd-1-2:~# ceph osd dump | grep pool
pool 0 'data' rep size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 448 pgp_num 448 last_change 23 owner 0 crash_replay_interval 45 pool 1 'metadata' rep size 3 min_size 1 crush_ruleset 1 object_hash rjenkins pg_num 448 pgp_num 448 last_change 12 owner 0 pool 2 'rbd' rep size 3 min_size 1 crush_ruleset 2 object_hash rjenkins pg_num 448 pgp_num 448 last_change 14 owner 0


root@ceph-osd-1-2:~# rados -p rbd ls | less
rb.0.2f64.238e1f29.000000031d19
....
rb.0.3bfe.2ae8944a.00000001c515
rb.0.dcd4f.238e1f29.00000003c183
cloud1-ftp1-block-xfs01.rbd
...
and it stalles after, because (I think) founds some incomplete, stale or another PG.


31.10.2013 11:35, Ирек Фасихов пишет:
root@ceph-osd-1-2:~# rbd -p POOLNAME ls
rbd: pool rbd doesn't contain rbd images
root@ceph-osd-1-2:~#


2013/10/31 Иван Кудрявцев <[email protected] <mailto:[email protected]>>

    Hello, List.

    I met very big trouble during ceph upgrade from bobtail to
    cuttlefish.

    My OSDs started to crash to stale so LA went to 100+ on node,
    after I stop OSD I unable to launch it again because of errors.
    So, I started to reformat OSDs and eventually meet that I have
    "incomplete" PGs and It seems to me that I lost actual data (It
    seems some drives I use is buggy and lost partition and filesystem
    at all after reboot node as them was empty, and I placed them in
    all my nodes in one time). I use ceph as source for RBD images, so
    all my images are unaccessible and ceph health is:

    root@ceph-osd-1-2:~# ceph -s
      cluster e74c9e0b-40ce-45e4-904a-676cbaaf042d
       health HEALTH_WARN 536 pgs backfill; 28 pgs backfilling; 546
    pgs degraded; 19 pgs down; 76 pgs incomplete; 33 pgs peering; 23
    pgs stale; 109 pgs stuck inactive; 23 pgs stuck stale; 673 pgs
    stuck unclean; 11 requests are blocked > 32 sec; recovery
    2769254/9737405 degraded (28.439%);  recovering 76 o/s, 169MB/s;
    mds cluster is degraded; clock skew detected on mon.0, mon.1
       monmap e1: 3 mons at
    {0=10.252.0.3:6789/0,1=10.252.0.4:6789/0,2=10.252.0.2:6789/0
    <http://10.252.0.3:6789/0,1=10.252.0.4:6789/0,2=10.252.0.2:6789/0>},
    election epoch 5486, quorum 0,1,2 0,1,2
       osdmap e52800: 48 osds: 41 up, 41 in
        pgmap v22294835: 1344 pgs: 671 active+clean, 18
    active+remapped+wait_backfill, 150 active+degraded+wait_backfill,
    12 down+peering, 14 stale+peering, 13 active+degraded+backfilling,
    368 active+degraded+remapped+wait_backfill, 7 stale+down+peering,
    74 incomplete, 15 active+degraded+remapped+backfilling, 2
    stale+incomplete; 6883 GB data, 14627 GB used, 31364 GB / 46031 GB
    avail; 2769254/9737405 degraded (28.439%);  recovering 76 o/s,
    169MB/s
       mdsmap e564: 1/1/1 up {0=2=up:replay}, 2 up:standby

    Actually, my images stored in RBD pool, but when I try to access
    it I get:

    root@ceph-osd-1-2:~# rbd ls
    rbd: pool rbd doesn't contain rbd images
    root@ceph-osd-1-2:~#

    however, I can access images directly:

    root@ceph-osd-1-2:~# rbd info lun-000000017
    rbd image 'lun-000000017':
            size 10000 MB in 320000 objects
            order 15 (32768 bytes objects)
            block_name_prefix: rb.0.3c07.238e1f29
            format: 1

    and what I really would like is to export data even with errors as
    usual files and try to check and fix some using fsck. As I
    understand I should someway fix "incomplete" pgs and "inactive"
    pgs but I don't really understand how to do it?

    root@ceph-osd-1-2:~#  ceph pg dump | awk '{print $1, $14, $2
    ,$6/1024/1024 , $9}' | grep '\.' | grep inc
    dumped all in format plain
    2.1be [43,15,37] 1603 4018.79 incomplete
    2.1aa [20,37,44] 4875 11930.5 incomplete
    0.1a4 [45,37,27] 0 0 incomplete
    2.1a1 [43,45,37] 988 2497.98 incomplete
    2.1a0 [30,29,37] 0 0 incomplete
    0.1a2 [30,29,37] 0 0 incomplete
    1.1a3 [45,37,27] 0 0 incomplete
    2.1a2 [45,37,27] 0 0 incomplete
    1.1a1 [30,29,37] 0 0 incomplete
    2.19d [18,42,24] 0 0 incomplete
    0.19f [18,42,24] 0 0 incomplete
    1.19e [18,42,24] 0 0 incomplete
    0.198 [1,43,37] 0 0 incomplete
    1.197 [1,43,37] 0 0 incomplete
    2.196 [1,43,37] 0 0 incomplete
    2.183 [28,30,37] 0 0 incomplete
    2.15f [27,32,37] 0 0 incomplete
    0.15c [42,31,37] 0 0 incomplete
    1.15b [42,31,37] 0 0 incomplete
    2.15a [42,31,37] 0 0 incomplete
    2.137 [7,30,37] 0 0 incomplete
    0.130 [41,31,25] 139 446.865 stale+incomplete
    0.12d [31,37,27] 0 0 incomplete
    1.12c [31,37,27] 0 0 incomplete
    2.12b [31,37,27] 0 0 incomplete
    0.124 [30,37,42] 0 0 incomplete
    1.123 [30,37,42] 0 0 incomplete
    2.122 [30,37,42] 0 0 incomplete
    0.109 [31,37,27] 0 0 incomplete
    1.108 [31,37,27] 0 0 incomplete
    2.107 [31,37,27] 0 0 incomplete
    2.f4 [12,37,20] 3132 7941.63 incomplete
    2.ee <http://2.ee> [12,11,37] 3959 9825.61 incomplete
    0.ec <http://0.ec> [32,37,42] 0 0 incomplete
    1.eb [32,37,42] 0 0 incomplete
    2.ea [32,37,42] 0 0 incomplete
    2.de <http://2.de> [45,37,21] 1228 2960.71 incomplete
    2.d8 [9,28,37] 832 2029.15 incomplete
    2.c1 [13,30,37] 0 0 incomplete
    2.b9 [32,2,37] 2971 7305.59 incomplete
    2.a4 [30,22,33] 1343 3365.41 incomplete
    2.9e [43,37,45] 651 1633.18 incomplete
    2.9a [30,2,37] 0 0 incomplete
    2.95 [6,43,37] 0 0 incomplete
    1.96 [6,43,37] 0 0 incomplete
    0.97 [6,43,37] 4 8.00209 incomplete
    2.91 [47,28,1] 4518 11066.5 stale+incomplete
    0.64 [32,37,8] 0 0 incomplete
    1.63 [32,37,8] 0 0 incomplete
    2.62 [32,37,8] 0 0 incomplete
    0.59 [32,37,21] 311 1010.57 incomplete
    2.55 [27,16,37] 0 0 incomplete
    0.57 [27,16,37] 0 0 incomplete
    1.56 [27,16,37] 0 0 incomplete
    2.57 [32,37,21] 42 72.6953 incomplete
    2.4c [11,37,30] 3315 8320.87 incomplete
    0.49 [18,7,37] 0 0 incomplete
    1.48 [18,7,37] 0 0 incomplete
    2.47 [18,7,37] 0 0 incomplete
    2.3d [28,44,37] 0 0 incomplete
    0.3f [28,44,37] 0 0 incomplete
    1.3e [28,44,37] 0 0 incomplete
    2.2a [43,16,37] 5379 13236.4 incomplete
    0.18 [32,42,37] 0 0 incomplete
    1.17 [32,42,37] 0 0 incomplete
    2.16 [32,42,37] 0 0 incomplete
    2.d [45,29,37] 0 0 incomplete
    1.e [45,29,37] 0 0 incomplete
    0.f [45,29,37] 0 0 incomplete
    2.c [18,20,37] 1208 3002.21 incomplete
    2.9 [45,37,27] 0 0 incomplete
    1.a [45,37,27] 0 0 incomplete
    0.b [45,37,27] 28 80.5886 incomplete
    2.7 [22,37,30] 2889 7272.09 incomplete
    0.1 [31,37,27] 39 135.707 incomplete
    1.0 [31,37,27] 0 0 incomplete



    _______________________________________________
    ceph-users mailing list
    [email protected] <mailto:[email protected]>
    http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
С уважением, Фасихов Ирек Нургаязович
Моб.: +79229045757

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to