Am 21.08.2013 17:32, schrieb Samuel Just:
Have you tried setting osd_recovery_clone_overlap to false?  That
seemed to help with Stefan's issue.

This might sound a bug harsh but maybe due to my limited english skills ;-)

I still think that Cephs recovery system is broken by design. If an OSD comes back (was offline) all write requests regarding PGs where this one is primary are targeted immediatly to this OSD. If this one is not up2date for an PG it tries to recover that one immediatly which costs 4MB / block. If you have a lot of small write all over your OSDs and PGs you're sucked as your OSD has to recover ALL it's PGs immediatly or at least lots of them WHICH can't work. This is totally crazy.

I think the right way would be:
1.) if an OSD goes down the replicas got primaries

or

2.) an OSD which does not have an up2date PG should redirect to the OSD holding the secondary or third replica.

Both results in being able to have a really smooth and slow recovery without any stress even under heavy 4K workloads like rbd backed VMs.

Thanks for reading!

Greets Stefan


-Sam

On Wed, Aug 21, 2013 at 8:28 AM, Mike Dawson <mike.daw...@cloudapt.com> wrote:
Sam/Josh,

We upgraded from 0.61.7 to 0.67.1 during a maintenance window this morning,
hoping it would improve this situation, but there was no appreciable change.

One node in our cluster fsck'ed after a reboot and got a bit behind. Our
instances backed by RBD volumes were OK at that point, but once the node
booted fully and the OSDs started, all Windows instances with rbd volumes
experienced very choppy performance and were unable to ingest video
surveillance traffic and commit it to disk. Once the cluster got back to
HEALTH_OK, they resumed normal operation.

I tried for a time with conservative recovery settings (osd max backfills =
1, osd recovery op priority = 1, and osd recovery max active = 1). No
improvement for the guests. So I went to more aggressive settings to get
things moving faster. That decreased the duration of the outage.

During the entire period of recovery/backfill, the network looked fine...no
where close to saturation. iowait on all drives look fine as well.

Any ideas?

Thanks,
Mike Dawson



On 8/14/2013 3:04 AM, Stefan Priebe - Profihost AG wrote:

the same problem still occours. Will need to check when i've time to
gather logs again.

Am 14.08.2013 01:11, schrieb Samuel Just:

I'm not sure, but your logs did show that you had >16 recovery ops in
flight, so it's worth a try.  If it doesn't help, you should collect
the same set of logs I'll look again.  Also, there are a few other
patches between 61.7 and current cuttlefish which may help.
-Sam

On Tue, Aug 13, 2013 at 2:03 PM, Stefan Priebe - Profihost AG
<s.pri...@profihost.ag> wrote:


Am 13.08.2013 um 22:43 schrieb Samuel Just <sam.j...@inktank.com>:

I just backported a couple of patches from next to fix a bug where we
weren't respecting the osd_recovery_max_active config in some cases
(1ea6b56170fc9e223e7c30635db02fa2ad8f4b4e).  You can either try the
current cuttlefish branch or wait for a 61.8 release.


Thanks! Are you sure that this is the issue? I don't believe that but
i'll give it a try. I already tested a branch from sage where he fixed a
race regarding max active some weeks ago. So active recovering was max 1 but
the issue didn't went away.

Stefan

-Sam

On Mon, Aug 12, 2013 at 10:34 PM, Samuel Just <sam.j...@inktank.com>
wrote:

I got swamped today.  I should be able to look tomorrow.  Sorry!
-Sam

On Mon, Aug 12, 2013 at 9:39 PM, Stefan Priebe - Profihost AG
<s.pri...@profihost.ag> wrote:

Did you take a look?

Stefan

Am 11.08.2013 um 05:50 schrieb Samuel Just <sam.j...@inktank.com>:

Great!  I'll take a look on Monday.
-Sam

On Sat, Aug 10, 2013 at 12:08 PM, Stefan Priebe
<s.pri...@profihost.ag> wrote:

Hi Samual,

Am 09.08.2013 23:44, schrieb Samuel Just:

I think Stefan's problem is probably distinct from Mike's.

Stefan: Can you reproduce the problem with

debug osd = 20
debug filestore = 20
debug ms = 1
debug optracker = 20

on a few osds (including the restarted osd), and upload those osd
logs
along with the ceph.log from before killing the osd until after
the
cluster becomes clean again?



done - you'll find the logs at cephdrop folder:
slow_requests_recovering_cuttlefish

osd.52 was the one recovering

Thanks!

Greets,
Stefan

--
To unsubscribe from this list: send the line "unsubscribe
ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel"
in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to