Thats exactly what I am doing the only difference is that i didnt need
to do step1 since for me the dev was already mounted in
/var/lib/ceph/ceph-### but remaining steps are exactly what i am doing.
it seems to me the PG got corrupted in my case in all copies and thats
what causing it to refuse to start witht he imported PG
Glad it worked for you
I am now marking the PGs as complete (data loss)
it will be long night today :)
On 7/22/2017 8:03 PM, Daniel K wrote:
I am in the process of doing exactly what you are -- this worked for me:
1. mount the first partition of the bluestore drive that holds the
missing PGs (if it's not already mounted)
> mkdir /mnt/tmp
> mount /dev/sdb1 /mnt/tmp
2. export the pg to a suitable temporary storage location:
> ceph-objectstore-tool --data-path /mnt/tmp --pgid 1.24 --op export
--file /mnt/sdd1/recover.1.24
3. find the acting osd
> ceph health detail |grep incomplete
PG_DEGRADED Degraded data redundancy: 23 pgs unclean, 23 pgs incomplete
pg 1.24 is incomplete, acting [18,13]
pg 4.1f is incomplete, acting [11]
...
4. set noout
> ceph osd set noout
5. Find the OSD and log into it -- I used 18 here.
> ceph osd find 18
{
"osd": 18,
"ip": "10.0.15.54:6801/9263 <http://10.0.15.54:6801/9263>",
"crush_location": {
"building": "building-dc",
"chassis": "chassis-dc400f5-10",
"city": "city",
"floor": "floor-dc4",
"host": "stor-vm4",
"rack": "rack-dc400f5",
"region": "cfl",
"room": "room-dc400",
"root": "default",
"row": "row-dc400f"
}
}
> ssh [email protected] <mailto:[email protected]>
6. copy the file to somewhere accessible by the new(acting) osd
> scp [email protected]:/mnt/sdd1/recover.1.24 /tmp/recover.1.24
7. stop the osd
> service ceph-osd@18 stop
8. import the file using ceph-objectstore-tool
> ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-18 --op
import --file /tmp/recover.1.24
9. start the osd
> service-osd@18 start
this worked for me -- not sure if this is the best way or if I took
any extra steps and I have yet to validate that the data is good.
I based this partially off your original email, and the guide here
http://ceph.com/geen-categorie/incomplete-pgs-oh-my/
On Sat, Jul 22, 2017 at 4:46 PM, mofta7y <[email protected]
<mailto:[email protected]>> wrote:
Hi All,
I have a situation here.
I have an EC pool that is having cache tier pool (the cache tier
is replicated with size 2).
Had an issue on the pool and the crush map got changed after
rebooting some OSD in any case I lost 4 cache ties OSDs
those lost OSDs are not really lost they look fine to me but
bluestore is giving me exception when starting them i cant deal
with it. (will open question about that exception as well)
So now i have 14 incomplete Pgs on the caching tier.
I am trying to recover them using ceph-objectstore-tool
the extraction and import works nice with no issues but the OSD
fail to start after wards with same issue as the original OSD .
after importing the PG on the acting OSD i get the exact same
exception I was getting while trying to start the failed OSD
removing that import resolve the issue.
So the question is how can use ceph-objectstore-tool to import in
bluestore as i think i am missing somthing here
here is the procedure and the steps i used
1- stop old osd (it cannot start anyway)
2- use this command to extract the pg i need
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-116
--pgid 15.371 --op export --file /tmp/recover.15.371
that command work
3- check what is the acting OSD for the pg
4- stop the acting OSD
5- delete the current folder with same og name
6- use this command
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-78 --op
import /tmp/recover.15.371
the error i got in both cases is this bluestore error
Jul 22 16:35:20 alm9 ceph-osd[3799171]: -257> 2017-07-22
16:20:19.544195 7f7157036a40 -1 osd.116 119691 log_to_monitors
{default=true}
Jul 22 16:35:20 alm9 ceph-osd[3799171]: 0> 2017-07-22
16:35:20.142143 7f713c597700 -1
/tmp/buildd/ceph-11.2.0/src/os/bluestore/BitMapAllocator.cc: In
function 'virtual int BitMapAllocator::reserve(uint64_t)' thread
7f713c597700 time 2017-07-22 16:35:20.139309
Jul 22 16:35:20 alm9 ceph-osd[3799171]:
/tmp/buildd/ceph-11.2.0/src/os/bluestore/BitMapAllocator.cc: 82:
FAILED assert(!(need % m_block_size))
Jul 22 16:35:20 alm9 ceph-osd[3799171]: ceph version 11.2.0
(f223e27eeb35991352ebc1f67423d4ebc252adb7)
Jul 22 16:35:20 alm9 ceph-osd[3799171]: 1:
(ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x80) [0x562b84558380]
Jul 22 16:35:20 alm9 ceph-osd[3799171]: 2:
(BitMapAllocator::reserve(unsigned long)+0x2ab) [0x562b8437c5cb]
Jul 22 16:35:20 alm9 ceph-osd[3799171]: 3:
(BlueFS::reclaim_blocks(unsigned int, unsigned long,
std::vector<AllocExtent,
mempool::pool_allocator<(mempool::pool_index_t)7, AllocExtent>
>*)+0x22a) [0x562b8435109a]
Jul 22 16:35:20 alm9 ceph-osd[3799171]: 4:
(BlueStore::_balance_bluefs_freespace(std::vector<bluestore_pextent_t,
std::allocator<bluestore_pextent_t> >*)+0x28e) [0x562b84270dae]
Jul 22 16:35:20 alm9 ceph-osd[3799171]: 5:
(BlueStore::_kv_sync_thread()+0x164a) [0x562b84273eea]
Jul 22 16:35:20 alm9 ceph-osd[3799171]: 6:
(BlueStore::KVSyncThread::entry()+0xd) [0x562b842ad9dd]
Jul 22 16:35:20 alm9 ceph-osd[3799171]: 7: (()+0x76ba)
[0x7f71560c76ba]
Jul 22 16:35:20 alm9 ceph-osd[3799171]: 8: (clone()+0x6d)
[0x7f71547953dd]
Jul 22 16:35:20 alm9 ceph-osd[3799171]: NOTE: a copy of the
executable, or `objdump -rdS <executable>` is needed to interpret
this.
if any one have any idea how to restore those PGs please point me
to the right direction
by the way resarting the folder that i deleted in step5 manually
make the osd go up again
Thanks
_______________________________________________
ceph-users mailing list
[email protected] <mailto:[email protected]>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
<http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com