Hi all,
Maybe this will help:
The issue is with shards 3,4 and 5 of PG 6.3f:
LOG's of OSD's 16, 17 & 36 (the ones crashing on startup).
*Log OSD.16 (shard 4):*
2018-06-08 08:35:01.727261 7f4c585e3700 -1
bluestore(/var/lib/ceph/osd/ceph-16) _txc_add_transaction error (2) No such
file or directory not handled on operation 30 (op 0, counting from 0)
2018-06-08 08:35:01.727273 7f4c585e3700 -1
bluestore(/var/lib/ceph/osd/ceph-16) ENOENT on clone suggests osd bug
2018-06-08 08:35:01.727274 7f4c585e3700 0
bluestore(/var/lib/ceph/osd/ceph-16) transaction dump:
{
"ops": [
{
"op_num": 0,
"op_name": "clonerange2",
"collection": "6.3fs4_head",
"src_oid":
"4#6:fc074663:::rbd_data.5.6c1d9574b0dc51.00000000000312db:head#903d0",
"dst_oid":
"4#6:fc074663:::rbd_data.5.6c1d9574b0dc51.00000000000312db:head#",
"src_offset": 950272,
"len": 98304,
"dst_offset": 950272
},
{
"op_num": 1,
"op_name": "remove",
"collection": "6.3fs4_head",
"oid":
"4#6:fc074663:::rbd_data.5.6c1d9574b0dc51.00000000000312db:head#903d0"
},
{
"op_num": 2,
"op_name": "setattrs",
"collection": "6.3fs4_head",
"oid":
"4#6:fc074663:::rbd_data.5.6c1d9574b0dc51.00000000000312db:head#",
"attr_lens": {
"_": 297,
"hinfo_key": 18,
"snapset": 35
}
},
{
"op_num": 3,
"op_name": "clonerange2",
"collection": "6.3fs4_head",
"src_oid":
"4#6:fc074663:::rbd_data.5.6c1d9574b0dc51.00000000000312db:head#903cf",
"dst_oid":
"4#6:fc074663:::rbd_data.5.6c1d9574b0dc51.00000000000312db:head#",
"src_offset": 679936,
"len": 274432,
"dst_offset": 679936
},
{
"op_num": 4,
"op_name": "remove",
"collection": "6.3fs4_head",
"oid":
"4#6:fc074663:::rbd_data.5.6c1d9574b0dc51.00000000000312db:head#903cf"
},
{
"op_num": 5,
"op_name": "setattrs",
"collection": "6.3fs4_head",
"oid":
"4#6:fc074663:::rbd_data.5.6c1d9574b0dc51.00000000000312db:head#",
"attr_lens": {
"_": 297,
"hinfo_key": 18,
"snapset": 35
}
},
{
"op_num": 6,
"op_name": "nop"
},
{
"op_num": 7,
"op_name": "op_omap_rmkeyrange",
"collection": "6.3fs4_head",
"oid": "4#6:fc000000::::head#",
"first": "0000011124.00000000000000590799",
"last": "4294967295.18446744073709551615"
},
{
"op_num": 8,
"op_name": "omap_setkeys",
"collection": "6.3fs4_head",
"oid": "4#6:fc000000::::head#",
"attr_lens": {
"_biginfo": 597,
"_epoch": 4,
"_info": 953,
"can_rollback_to": 12,
"rollback_info_trimmed_to": 12
}
}
]
}
2018-06-08 08:35:01.730584 7f4c585e3700 -1
/home/builder/source/ceph-12.2.2/src/os/bluestore/BlueStore.cc: In function
'void BlueStore::_txc_add_transaction(BlueStore::TransContext*,
ObjectStore::Transaction*)' thread 7f4c585e3700 time 2018-06-08
08:35:01.727379
/home/builder/source/ceph-12.2.2/src/os/bluestore/BlueStore.cc: 9363:
FAILED assert(0 == "unexpected error")
ceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf) luminous
(stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x102) [0x558e08ba4202]
2: (BlueStore::_txc_add_transaction(BlueStore::TransContext*,
ObjectStore::Transaction*)+0x15fa) [0x558e08a55c3a]
3: (BlueStore::queue_transactions(ObjectStore::Sequencer*,
std::vector<ObjectStore::Transaction,
std::allocator<ObjectStore::Transaction> >&,
boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x546)
[0x558e08a572a6]
4: (ObjectStore::queue_transaction(ObjectStore::Sequencer*,
ObjectStore::Transaction&&, Context*, Context*, Context*,
boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x14f)
[0x558e085fa37f]
5: (OSD::dispatch_context_transaction(PG::RecoveryCtx&, PG*,
ThreadPool::TPHandle*)+0x6c) [0x558e0857db5c]
6: (OSD::process_peering_events(std::__cxx11::list<PG*,
std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x442) [0x558e085abec2]
7: (ThreadPool::BatchWorkQueue<PG>::_void_process(void*,
ThreadPool::TPHandle&)+0x2c) [0x558e0861a91c]
8: (ThreadPool::worker(ThreadPool::WorkThread*)+0xeb8) [0x558e08bab3a8]
9: (ThreadPool::WorkThread::entry()+0x10) [0x558e08bac540]
10: (()+0x7494) [0x7f4c709ca494]
11: (clone()+0x3f) [0x7f4c6fa51aff]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
to interpret this.
*Log OSD.17 (Shard 5):*
2018-06-08 08:35:56.745249 7fe3fa687700 -1
bluestore(/var/lib/ceph/osd/ceph-17) _txc_add_transaction error (2) No such
file or directory not handled on operation 30 (op 0, counting from 0)
2018-06-08 08:35:56.745264 7fe3fa687700 -1
bluestore(/var/lib/ceph/osd/ceph-17) ENOENT on clone suggests osd bug
2018-06-08 08:35:56.745266 7fe3fa687700 0
bluestore(/var/lib/ceph/osd/ceph-17) transaction dump:
{
"ops": [
{
"op_num": 0,
"op_name": "clonerange2",
"collection": "6.3fs5_head",
"src_oid":
"5#6:fc074663:::rbd_data.5.6c1d9574b0dc51.00000000000312db:head#903d0",
"dst_oid":
"5#6:fc074663:::rbd_data.5.6c1d9574b0dc51.00000000000312db:head#",
"src_offset": 950272,
"len": 98304,
"dst_offset": 950272
},
{
"op_num": 1,
"op_name": "remove",
"collection": "6.3fs5_head",
"oid":
"5#6:fc074663:::rbd_data.5.6c1d9574b0dc51.00000000000312db:head#903d0"
},
{
"op_num": 2,
"op_name": "setattrs",
"collection": "6.3fs5_head",
"oid":
"5#6:fc074663:::rbd_data.5.6c1d9574b0dc51.00000000000312db:head#",
"attr_lens": {
"_": 297,
"hinfo_key": 18,
"snapset": 35
}
},
{
"op_num": 3,
"op_name": "clonerange2",
"collection": "6.3fs5_head",
"src_oid":
"5#6:fc074663:::rbd_data.5.6c1d9574b0dc51.00000000000312db:head#903cf",
"dst_oid":
"5#6:fc074663:::rbd_data.5.6c1d9574b0dc51.00000000000312db:head#",
"src_offset": 679936,
"len": 274432,
"dst_offset": 679936
},
{
"op_num": 4,
"op_name": "remove",
"collection": "6.3fs5_head",
"oid":
"5#6:fc074663:::rbd_data.5.6c1d9574b0dc51.00000000000312db:head#903cf"
},
{
"op_num": 5,
"op_name": "setattrs",
"collection": "6.3fs5_head",
"oid":
"5#6:fc074663:::rbd_data.5.6c1d9574b0dc51.00000000000312db:head#",
"attr_lens": {
"_": 297,
"hinfo_key": 18,
"snapset": 35
}
},
{
"op_num": 6,
"op_name": "nop"
},
{
"op_num": 7,
"op_name": "op_omap_rmkeyrange",
"collection": "6.3fs5_head",
"oid": "5#6:fc000000::::head#",
"first": "0000011124.00000000000000590799",
"last": "4294967295.18446744073709551615"
},
{
"op_num": 8,
"op_name": "omap_setkeys",
"collection": "6.3fs5_head",
"oid": "5#6:fc000000::::head#",
"attr_lens": {
"_biginfo": 586,
"_epoch": 4,
"_info": 953,
"can_rollback_to": 12,
"rollback_info_trimmed_to": 12
}
}
]
}
2018-06-08 08:35:56.748436 7fe3fa687700 -1
/home/builder/source/ceph-12.2.2/src/os/bluestore/BlueStore.cc: In function
'void BlueStore::_txc_add_transaction(BlueStore::TransContext*,
ObjectStore::Transaction*)' thread 7fe3fa687700 time 2018-06-08
08:35:56.745458
/home/builder/source/ceph-12.2.2/src/os/bluestore/BlueStore.cc: 9363:
FAILED assert(0 == "unexpected error")
ceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf) luminous
(stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x102) [0x55695de1f202]
2: (BlueStore::_txc_add_transaction(BlueStore::TransContext*,
ObjectStore::Transaction*)+0x15fa) [0x55695dcd0c3a]
3: (BlueStore::queue_transactions(ObjectStore::Sequencer*,
std::vector<ObjectStore::Transaction,
std::allocator<ObjectStore::Transaction> >&,
boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x546)
[0x55695dcd22a6]
4: (ObjectStore::queue_transaction(ObjectStore::Sequencer*,
ObjectStore::Transaction&&, Context*, Context*, Context*,
boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x14f)
[0x55695d87537f]
5: (OSD::dispatch_context_transaction(PG::RecoveryCtx&, PG*,
ThreadPool::TPHandle*)+0x6c) [0x55695d7f8b5c]
6: (OSD::process_peering_events(std::__cxx11::list<PG*,
std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x442) [0x55695d826ec2]
7: (ThreadPool::BatchWorkQueue<PG>::_void_process(void*,
ThreadPool::TPHandle&)+0x2c) [0x55695d89591c]
8: (ThreadPool::worker(ThreadPool::WorkThread*)+0xeb8) [0x55695de263a8]
9: (ThreadPool::WorkThread::entry()+0x10) [0x55695de27540]
10: (()+0x7494) [0x7fe412a6e494]
11: (clone()+0x3f) [0x7fe411af5aff]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
to interpret this.
*Log OSD.36 (Shard 3):*
-3> 2018-06-08 09:25:36.660055 7f43c9262700 -1
bluestore(/var/lib/ceph/osd/ceph-36) _txc_add_transaction error (2) No such
file or directory not handled on operation 30 (op 0, counting from 0)
-2> 2018-06-08 09:25:36.660068 7f43c9262700 -1
bluestore(/var/lib/ceph/osd/ceph-36) ENOENT on clone suggests osd bug
-1> 2018-06-08 09:25:36.660070 7f43c9262700 0
bluestore(/var/lib/ceph/osd/ceph-36) transaction dump:
{
"ops": [
{
"op_num": 0,
"op_name": "clonerange2",
"collection": "*6.3fs3*_head",
"src_oid":
"3#6:fc074663:::rbd_data.5.6c1d9574b0dc51.00000000000312db:head#903d0",
"dst_oid":
"3#6:fc074663:::rbd_data.5.6c1d9574b0dc51.00000000000312db:head#",
"src_offset": 950272,
"len": 98304,
"dst_offset": 950272
},
{
"op_num": 1,
"op_name": "remove",
"collection": "6.3fs3_head",
"oid":
"3#6:fc074663:::rbd_data.5.6c1d9574b0dc51.00000000000312db:head#903d0"
},
{
"op_num": 2,
"op_name": "setattrs",
"collection": "6.3fs3_head",
"oid":
"3#6:fc074663:::rbd_data.5.6c1d9574b0dc51.00000000000312db:head#",
"attr_lens": {
"_": 297,
"hinfo_key": 18,
"snapset": 35
}
},
{
"op_num": 3,
"op_name": "clonerange2",
"collection": "6.3fs3_head",
"src_oid":
"3#6:fc074663:::rbd_data.5.6c1d9574b0dc51.00000000000312db:head#903cf",
"dst_oid":
"3#6:fc074663:::rbd_data.5.6c1d9574b0dc51.00000000000312db:head#",
"src_offset": 679936,
"len": 274432,
"dst_offset": 679936
},
{
"op_num": 4,
"op_name": "remove",
"collection": "6.3fs3_head",
"oid":
"3#6:fc074663:::rbd_data.5.6c1d9574b0dc51.00000000000312db:head#903cf"
},
{
"op_num": 5,
"op_name": "setattrs",
"collection": "6.3fs3_head",
"oid":
"3#6:fc074663:::rbd_data.5.6c1d9574b0dc51.00000000000312db:head#",
"attr_lens": {
"_": 297,
"hinfo_key": 18,
"snapset": 35
}
},
{
"op_num": 6,
"op_name": "nop"
},
{
"op_num": 7,
"op_name": "op_omap_rmkeyrange",
"collection": "6.3fs3_head",
"oid": "3#6:fc000000::::head#",
"first": "0000011124.00000000000000590799",
"last": "4294967295.18446744073709551615"
},
{
"op_num": 8,
"op_name": "omap_setkeys",
"collection": "6.3fs3_head",
"oid": "3#6:fc000000::::head#",
"attr_lens": {
"_biginfo": 743,
"_epoch": 4,
"_info": 953,
"can_rollback_to": 12,
"rollback_info_trimmed_to": 12
}
}
]
}
0> 2018-06-08 09:25:36.663334 7f43c9262700 -1
/home/builder/source/ceph-12.2.2/src/os/bluestore/BlueStore.cc: In function
'void BlueStore::_txc_add_transaction(BlueStore::TransContext*,
ObjectStore::Transaction*)' thread 7f43c9262700 time 2018-06-08
09:25:36.660177
/home/builder/source/ceph-12.2.2/src/os/bluestore/BlueStore.cc: 9363:
FAILED assert(0 == "unexpected error")
ceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf) luminous
(stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x102) [0x557025013202]
2: (BlueStore::_txc_add_transaction(BlueStore::TransContext*,
ObjectStore::Transaction*)+0x15fa) [0x557024ec4c3a]
3: (BlueStore::queue_transactions(ObjectStore::Sequencer*,
std::vector<ObjectStore::Transaction,
std::allocator<ObjectStore::Transaction> >&,
boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x546)
[0x557024ec62a6]
4: (ObjectStore::queue_transaction(ObjectStore::Sequencer*,
ObjectStore::Transaction&&, Context*, Context*, Context*,
boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x14f)
[0x557024a6937f]
5: (OSD::dispatch_context_transaction(PG::RecoveryCtx&, PG*,
ThreadPool::TPHandle*)+0x6c) [0x5570249ecb5c]
6: (OSD::process_peering_events(std::__cxx11::list<PG*,
std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x442) [0x557024a1aec2]
7: (ThreadPool::BatchWorkQueue<PG>::_void_process(void*,
ThreadPool::TPHandle&)+0x2c) [0x557024a8991c]
8: (ThreadPool::worker(ThreadPool::WorkThread*)+0xeb8) [0x55702501a3a8]
9: (ThreadPool::WorkThread::entry()+0x10) [0x55702501b540]
10: (()+0x7494) [0x7f43e0e48494]
11: (clone()+0x3f) [0x7f43dfecfaff]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
to interpret this.
Kind regards,
Caspar Smit
2018-06-08 16:38 GMT+02:00 Caspar Smit <[email protected]>:
> Hi all,
>
> I seem to be hitting these tracker issues:
>
> https://tracker.ceph.com/issues/23145
> http://tracker.ceph.com/issues/24422
>
> PG's 6.1 and 6.3f are having the issues
>
> When i list all PG's of a down OSD with:
>
> ceph-objectstore-tool --dry-run --type bluestore --data-path
> /var/lib/ceph/osd/ceph-17/ --op list-pgs
>
> There are a lot of 'double' pgid's like (also for other pg's):
>
> 6.3fs3
> 6.3fs5
>
> Is that normal? I would assume different shards for EC would be on
> seperate OSD's
>
> We still have 4 OSD's down and 2 PG's down+remapped and i can't find any
> way to get the crashed OSD's back up.
>
> pg 6.1 is down+remapped, acting [6,3,2147483647,29,2147483647,
> 2147483647]
> pg 6.3f is down+remapped, acting [20,24,2147483647,2147483647,3,28]
>
> Kind regards,
> Caspar Smit
>
> 2018-06-08 8:53 GMT+02:00 Caspar Smit <[email protected]>:
>
>> Update:
>>
>> I've unset nodown to let it continue but now 4 osd's are down and cannot
>> be brought up again, here's what the lofgfile reads:
>>
>> 2018-06-08 08:35:01.716245 7f4c58de4700 0 log_channel(cluster) log [INF]
>> : 6.e3s0 continuing backfill to osd.37(4) from (10864'911406,11124'921472]
>> 6:c7d71bbd:::rbd_data.5.6c1d9574b0dc51.0000000000bf38b9:head to
>> 11124'921472
>> 2018-06-08 08:35:01.727261 7f4c585e3700 -1
>> bluestore(/var/lib/ceph/osd/ceph-16)
>> _txc_add_transaction error (2) No such file or directory not handled on
>> operation 30 (op 0, counting from 0)
>> 2018-06-08 08:35:01.727273 7f4c585e3700 -1
>> bluestore(/var/lib/ceph/osd/ceph-16)
>> ENOENT on clone suggests osd bug
>>
>> 2018-06-08 08:35:01.730584 7f4c585e3700 -1 /home/builder/source/ceph-12.2
>> .2/src/os/bluestore/BlueStore.cc: In function 'void
>> BlueStore::_txc_add_transaction(BlueStore::TransContext*,
>> ObjectStore::Transaction*)' thread 7f4c585e3700 time 2018-06-08
>> 08:35:01.727379
>> /home/builder/source/ceph-12.2.2/src/os/bluestore/BlueStore.cc: 9363:
>> FAILED assert(0 == "unexpected error")
>>
>> ceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf) luminous
>> (stable)
>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> const*)+0x102) [0x558e08ba4202]
>> 2: (BlueStore::_txc_add_transaction(BlueStore::TransContext*,
>> ObjectStore::Transaction*)+0x15fa) [0x558e08a55c3a]
>> 3: (BlueStore::queue_transactions(ObjectStore::Sequencer*,
>> std::vector<ObjectStore::Transaction,
>> std::allocator<ObjectStore::Transaction>
>> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x546)
>> [0x558e08a572a6]
>> 4: (ObjectStore::queue_transaction(ObjectStore::Sequencer*,
>> ObjectStore::Transaction&&, Context*, Context*, Context*,
>> boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x14f)
>> [0x558e085fa37f]
>> 5: (OSD::dispatch_context_transaction(PG::RecoveryCtx&, PG*,
>> ThreadPool::TPHandle*)+0x6c) [0x558e0857db5c]
>> 6: (OSD::process_peering_events(std::__cxx11::list<PG*,
>> std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x442) [0x558e085abec2]
>> 7: (ThreadPool::BatchWorkQueue<PG>::_void_process(void*,
>> ThreadPool::TPHandle&)+0x2c) [0x558e0861a91c]
>> 8: (ThreadPool::worker(ThreadPool::WorkThread*)+0xeb8) [0x558e08bab3a8]
>> 9: (ThreadPool::WorkThread::entry()+0x10) [0x558e08bac540]
>> 10: (()+0x7494) [0x7f4c709ca494]
>> 11: (clone()+0x3f) [0x7f4c6fa51aff]
>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
>> to interpret this.
>>
>> Any help is highly appreciated.
>>
>> Kind regards,
>> Caspar Smit
>>
>>
>> 2018-06-08 7:57 GMT+02:00 Caspar Smit <[email protected]>:
>>
>>> Well i let it run with flags nodown and it looked like it would finish
>>> BUT it all went wrong somewhere:
>>>
>>> This is now the state:
>>>
>>> health: HEALTH_ERR
>>> nodown flag(s) set
>>> 5602396/94833780 objects misplaced (5.908%)
>>> Reduced data availability: 143 pgs inactive, 142 pgs
>>> peering, 7 pgs stale
>>> Degraded data redundancy: 248859/94833780 objects degraded
>>> (0.262%), 194 pgs unclean, 21 pgs degraded, 12 pgs undersized
>>> 11 stuck requests are blocked > 4096 sec
>>>
>>> pgs: 13.965% pgs not active
>>> 248859/94833780 objects degraded (0.262%)
>>> 5602396/94833780 objects misplaced (5.908%)
>>> 830 active+clean
>>> 75 remapped+peering
>>> 66 peering
>>> 26 active+remapped+backfill_wait
>>> 6 active+undersized+degraded+remapped+backfill_wait
>>> 6 active+recovery_wait+degraded+remapped
>>> 3 active+undersized+degraded+remapped+backfilling
>>> 3 stale+active+undersized+degraded+remapped+backfill_wait
>>> 3 stale+active+remapped+backfill_wait
>>> 2 active+recovery_wait+degraded
>>> 2 active+remapped+backfilling
>>> 1 activating+degraded+remapped
>>> 1 stale+remapped+peering
>>>
>>>
>>> #ceph health detail shows:
>>>
>>> REQUEST_STUCK 11 stuck requests are blocked > 4096 sec
>>> 11 ops are blocked > 16777.2 sec
>>> osds 4,7,23,24 have stuck requests > 16777.2 sec
>>>
>>>
>>> So what happened and what should i do now?
>>>
>>> Thank you very much for any help
>>>
>>> Kind regards,
>>> Caspar
>>>
>>>
>>> 2018-06-07 13:33 GMT+02:00 Sage Weil <[email protected]>:
>>>
>>>> On Wed, 6 Jun 2018, Caspar Smit wrote:
>>>> > Hi all,
>>>> >
>>>> > We have a Luminous 12.2.2 cluster with 3 nodes and i recently added a
>>>> node
>>>> > to it.
>>>> >
>>>> > osd-max-backfills is at the default 1 so backfilling didn't go very
>>>> fast
>>>> > but that doesn't matter.
>>>> >
>>>> > Once it started backfilling everything looked ok:
>>>> >
>>>> > ~300 pgs in backfill_wait
>>>> > ~10 pgs backfilling (~number of new osd's)
>>>> >
>>>> > But i noticed the degraded objects increasing a lot. I presume a pg
>>>> that is
>>>> > in backfill_wait state doesn't accept any new writes anymore? Hence
>>>> > increasing the degraded objects?
>>>> >
>>>> > So far so good, but once a while i noticed a random OSD flapping
>>>> (they come
>>>> > back up automatically). This isn't because the disk is saturated but a
>>>> > driver/controller/kernel incompatibility which 'hangs' the disk for a
>>>> short
>>>> > time (scsi abort_task error in syslog). Investigating further i
>>>> noticed
>>>> > this was already the case before the node expansion.
>>>> >
>>>> > These OSD's flapping results in lots of pg states which are a bit
>>>> worrying:
>>>> >
>>>> > 109 active+remapped+backfill_wait
>>>> > 80 active+undersized+degraded+remapped+backfill_wait
>>>> > 51 active+recovery_wait+degraded+remapped
>>>> > 41 active+recovery_wait+degraded
>>>> > 27 active+recovery_wait+undersized+degraded+remapped
>>>> > 14 active+undersized+remapped+backfill_wait
>>>> > 4 active+undersized+degraded+remapped+backfilling
>>>> >
>>>> > I think the recovery_wait is more important then the backfill_wait,
>>>> so i
>>>> > like to prioritize these because the recovery_wait was triggered by
>>>> the
>>>> > flapping OSD's
>>>>
>>>> Just a note: this is fixed in mimic. Previously, we would choose the
>>>> highest-priority PG to start recovery on at the time, but once recovery
>>>> had started, the appearance of a new PG with a higher priority (e.g.,
>>>> because it finished peering after the others) wouldn't preempt/cancel
>>>> the
>>>> other PG's recovery, so you would get behavior like the above.
>>>>
>>>> Mimic implements that preemption, so you should not see behavior like
>>>> this. (If you do, then the function that assigns a priority score to a
>>>> PG needs to be tweaked.)
>>>>
>>>> sage
>>>>
>>>
>>>
>>
>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com