Re: [ceph-users] Upgrade to Infernalis: OSDs crash all the time

Kees Meijs Mon, 20 Aug 2018 12:47:26 -0700

Hi again,

I'm starting to feel really unlucky here...


At the moment, the situation is "sort of okay":

                1387 active+clean
                  11 active+clean+inconsistent
                   7 active+recovery_wait+degraded
                   1 active+recovery_wait+undersized+degraded+remapped
                   1 active+undersized+degraded+remapped+wait_backfill

1active+undersized+degraded+remapped+inconsistent+backfilling

To ensure nothing is in the way, I disabled both scrubbing and deepscrubbing for the time being.

However, random OSDs (still on Hammer) constantly crash giving the erroras mentioned earlier (osd/ReplicatedPG.cc: 10115: FAILED assert(r >= 0)).

It felt like they started crashing when hitting the PG currentlybackfilling, so I set the nobackfill flag.

For now, the crashing seems to have stopped. However, the cluster seemsslow at the moment when trying to access the given PG via KVM/QEMU (RBD).


Recap:

 * All monitors run Infernalis.
 * One OSD node runs Infernalis.
 * All other OSD nodes run Hammer.
 * One OSD on Infernalis is set to "out" and is stopped. This OSD
   seemed to contain one inconsistent PG.
 * Backfilling started.
 * After hours and hours of backfilling, OSDs started to crash.

Other than restarting the "out" and stopped OSD for the time being(haven't tried that yet) I'm quite lost.


Hopefully someone has some pointers for me.

Regards,
Kees

On 20-08-18 13:23, Kees Meijs wrote:

The given PG is back online, phew...

Meanwhile, some OSDs still on Hammer seem to crash with errors alike:

2018-08-20 13:06:33.819569 7f8962b2f700 -1 osd/ReplicatedPG.cc: In
function 'void ReplicatedPG::scan_range(int, int,
PG::BackfillInterval*, ThreadPool::TPHandle&)' thread 7f8962b2f700
time 2018-08-20 13:06:33.709922
osd/ReplicatedPG.cc: 10115: FAILED assert(r >= 0)

Restarting the OSDs seems to work.

K.

On 20-08-18 13:14, Kees Meijs wrote:

Bad news: I've got a PG stuck in down+peering now.

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Upgrade to Infernalis: OSDs crash all the time

Reply via email to