Re: [ceph-users] dealing with incomplete PGs while using bluestore

mofta7y Sat, 22 Jul 2017 17:14:56 -0700

Thats exactly what I am doing the only difference is that i didnt needto do step1 since for me the dev was already mounted in/var/lib/ceph/ceph-### but remaining steps are exactly what i am doing.

it seems to me the PG got corrupted in my case in all copies and thatswhat causing it to refuse to start witht he imported PG



Glad it worked for you

I am now marking the PGs as complete (data loss)

it will be long night today :)



On 7/22/2017 8:03 PM, Daniel K wrote:

I am in the process of doing exactly what you are -- this worked for me:

1. mount the first partition of the bluestore drive that holds themissing PGs (if it's not already mounted)

> mkdir /mnt/tmp
> mount /dev/sdb1 /mnt/tmp


2. export the pg to a suitable temporary storage location:

> ceph-objectstore-tool --data-path /mnt/tmp --pgid 1.24 --op export--file /mnt/sdd1/recover.1.24


3. find the acting osd
> ceph health detail |grep incomplete

PG_DEGRADED Degraded data redundancy: 23 pgs unclean, 23 pgs incomplete
    pg 1.24 is incomplete, acting [18,13]
    pg 4.1f is incomplete, acting [11]
    ...
4. set noout
> ceph osd set noout

5. Find the OSD and log into it -- I used 18 here.
> ceph osd find 18
{
    "osd": 18,
    "ip": "10.0.15.54:6801/9263 <http://10.0.15.54:6801/9263>",
    "crush_location": {
        "building": "building-dc",
        "chassis": "chassis-dc400f5-10",
        "city": "city",
        "floor": "floor-dc4",
        "host": "stor-vm4",
        "rack": "rack-dc400f5",
        "region": "cfl",
        "room": "room-dc400",
        "root": "default",
        "row": "row-dc400f"
    }
}

> ssh [email protected] <mailto:[email protected]>

6. copy the file to somewhere accessible by the new(acting) osd
> scp [email protected]:/mnt/sdd1/recover.1.24 /tmp/recover.1.24

7. stop the osd
> service ceph-osd@18 stop

8. import the file using ceph-objectstore-tool

> ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-18 --opimport --file /tmp/recover.1.24


9. start the osd
> service-osd@18 start

this worked for me -- not sure if this is the best way or if I tookany extra steps and I have yet to validate that the data is good.

I based this partially off your original email, and the guide herehttp://ceph.com/geen-categorie/incomplete-pgs-oh-my/

On Sat, Jul 22, 2017 at 4:46 PM, mofta7y <[email protected]<mailto:[email protected]>> wrote:


    Hi All,

    I have a situation here.

    I have an EC pool that is having cache tier pool (the cache tier
    is replicated with size 2).

    Had an issue on the pool and the crush map got changed after
    rebooting some OSD in any case I lost 4 cache ties OSDs

    those lost OSDs are not really lost they look fine to me but
    bluestore is giving me exception when starting them i cant deal
    with it. (will open question about that exception as well)

    So now i have 14 incomplete Pgs on the caching tier.


    I am trying to recover them using ceph-objectstore-tool

    the extraction and import works nice with no issues but the OSD
    fail to start after wards with same issue as the original OSD .

    after importing the PG on the acting OSD i get the exact same
    exception I was getting while trying to start the failed OSD

    removing that import resolve the issue.


    So the question is how can use ceph-objectstore-tool to import in
    bluestore as i think i am missing somthing here


    here is the procedure and the steps i used

    1- stop old osd (it cannot start anyway)

    2- use this command to extract the pg i need

    ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-116
    --pgid 15.371 --op export --file /tmp/recover.15.371

    that command work

    3- check what is the acting OSD for the pg

    4- stop the acting OSD

    5- delete the current folder with same og name

    6- use this command

    ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-78 --op
    import /tmp/recover.15.371
    the error i got in both cases is this bluestore error

    Jul 22 16:35:20 alm9 ceph-osd[3799171]:   -257> 2017-07-22
    16:20:19.544195 7f7157036a40 -1 osd.116 119691 log_to_monitors
    {default=true}
    Jul 22 16:35:20 alm9 ceph-osd[3799171]:      0> 2017-07-22
    16:35:20.142143 7f713c597700 -1
    /tmp/buildd/ceph-11.2.0/src/os/bluestore/BitMapAllocator.cc: In
    function 'virtual int BitMapAllocator::reserve(uint64_t)' thread
    7f713c597700 time 2017-07-22 16:35:20.139309
    Jul 22 16:35:20 alm9 ceph-osd[3799171]:
    /tmp/buildd/ceph-11.2.0/src/os/bluestore/BitMapAllocator.cc: 82:
    FAILED assert(!(need % m_block_size))
    Jul 22 16:35:20 alm9 ceph-osd[3799171]:  ceph version 11.2.0
    (f223e27eeb35991352ebc1f67423d4ebc252adb7)
    Jul 22 16:35:20 alm9 ceph-osd[3799171]:  1:
    (ceph::__ceph_assert_fail(char const*, char const*, int, char
    const*)+0x80) [0x562b84558380]
    Jul 22 16:35:20 alm9 ceph-osd[3799171]:  2:
    (BitMapAllocator::reserve(unsigned long)+0x2ab) [0x562b8437c5cb]
    Jul 22 16:35:20 alm9 ceph-osd[3799171]:  3:
    (BlueFS::reclaim_blocks(unsigned int, unsigned long,
    std::vector<AllocExtent,
    mempool::pool_allocator<(mempool::pool_index_t)7, AllocExtent>
    >*)+0x22a) [0x562b8435109a]
    Jul 22 16:35:20 alm9 ceph-osd[3799171]:  4:
    (BlueStore::_balance_bluefs_freespace(std::vector<bluestore_pextent_t,
    std::allocator<bluestore_pextent_t> >*)+0x28e) [0x562b84270dae]
    Jul 22 16:35:20 alm9 ceph-osd[3799171]:  5:
    (BlueStore::_kv_sync_thread()+0x164a) [0x562b84273eea]
    Jul 22 16:35:20 alm9 ceph-osd[3799171]:  6:
    (BlueStore::KVSyncThread::entry()+0xd) [0x562b842ad9dd]
    Jul 22 16:35:20 alm9 ceph-osd[3799171]:  7: (()+0x76ba)
    [0x7f71560c76ba]
    Jul 22 16:35:20 alm9 ceph-osd[3799171]:  8: (clone()+0x6d)
    [0x7f71547953dd]
    Jul 22 16:35:20 alm9 ceph-osd[3799171]:  NOTE: a copy of the
    executable, or `objdump -rdS <executable>` is needed to interpret
    this.


    if any one have any idea how to restore those PGs please point me
    to the right direction


    by the way resarting the folder that i deleted in step5 manually
    make the osd go up again



    Thanks

    _______________________________________________
    ceph-users mailing list
    [email protected] <mailto:[email protected]>
    http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
    <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] dealing with incomplete PGs while using bluestore

Reply via email to