Yes, good idea.

I was looking the «WBThrottle» feature, but go for logging instead.


Le mercredi 04 mars 2015 à 17:10 +0100, Alexandre DERUMIER a écrit :
> >>Only writes ;) 
> 
> ok, so maybe some background operations (snap triming, scrubing...).
> 
> maybe debug_osd=20 , could give you more logs ?
> 
> 
> ----- Mail original -----
> De: "Olivier Bonvalet" <ceph.l...@daevel.fr>
> À: "aderumier" <aderum...@odiso.com>
> Cc: "ceph-users" <ceph-users@lists.ceph.com>
> Envoyé: Mercredi 4 Mars 2015 16:42:13
> Objet: Re: [ceph-users] Perf problem after upgrade from dumpling to firefly
> 
> Only writes ;) 
> 
> 
> Le mercredi 04 mars 2015 à 16:19 +0100, Alexandre DERUMIER a écrit : 
> > >>The change is only on OSD (and not on OSD journal). 
> > 
> > do you see twice iops for read and write ? 
> > 
> > if only read, maybe a read ahead bug could explain this. 
> > 
> > ----- Mail original ----- 
> > De: "Olivier Bonvalet" <ceph.l...@daevel.fr> 
> > À: "aderumier" <aderum...@odiso.com> 
> > Cc: "ceph-users" <ceph-users@lists.ceph.com> 
> > Envoyé: Mercredi 4 Mars 2015 15:13:30 
> > Objet: Re: [ceph-users] Perf problem after upgrade from dumpling to firefly 
> > 
> > Ceph health is OK yes. 
> > 
> > The «firefly-upgrade-cluster-IO.png» graph is about IO stats seen by 
> > ceph : there is no change between dumpling and firefly. The change is 
> > only on OSD (and not on OSD journal). 
> > 
> > 
> > Le mercredi 04 mars 2015 à 15:05 +0100, Alexandre DERUMIER a écrit : 
> > > >>The load problem is permanent : I have twice IO/s on HDD since firefly. 
> > > 
> > > Oh, permanent, that's strange. (If you don't see more traffic coming from 
> > > clients, I don't understand...) 
> > > 
> > > do you see also twice ios/ ops in "ceph -w " stats ? 
> > > 
> > > is the ceph health ok ? 
> > > 
> > > 
> > > 
> > > ----- Mail original ----- 
> > > De: "Olivier Bonvalet" <ceph.l...@daevel.fr> 
> > > À: "aderumier" <aderum...@odiso.com> 
> > > Cc: "ceph-users" <ceph-users@lists.ceph.com> 
> > > Envoyé: Mercredi 4 Mars 2015 14:49:41 
> > > Objet: Re: [ceph-users] Perf problem after upgrade from dumpling to 
> > > firefly 
> > > 
> > > Thanks Alexandre. 
> > > 
> > > The load problem is permanent : I have twice IO/s on HDD since firefly. 
> > > And yes, the problem hang the production at night during snap trimming. 
> > > 
> > > I suppose there is a new OSD parameter which change behavior of the 
> > > journal, or something like that. But didn't find anything about that. 
> > > 
> > > Olivier 
> > > 
> > > Le mercredi 04 mars 2015 à 14:44 +0100, Alexandre DERUMIER a écrit : 
> > > > Hi, 
> > > > 
> > > > maybe this is related ?: 
> > > > 
> > > > http://tracker.ceph.com/issues/9503 
> > > > "Dumpling: removing many snapshots in a short time makes OSDs go 
> > > > berserk" 
> > > > 
> > > > http://tracker.ceph.com/issues/9487 
> > > > "dumpling: snaptrimmer causes slow requests while backfilling. 
> > > > osd_snap_trim_sleep not helping" 
> > > > 
> > > > http://lists.opennebula.org/pipermail/ceph-users-ceph.com/2014-December/045116.html
> > > >  
> > > > 
> > > > 
> > > > 
> > > > I think it's already backport in dumpling, not sure it's already done 
> > > > for firefly 
> > > > 
> > > > 
> > > > Alexandre 
> > > > 
> > > > 
> > > > 
> > > > ----- Mail original ----- 
> > > > De: "Olivier Bonvalet" <ceph.l...@daevel.fr> 
> > > > À: "ceph-users" <ceph-users@lists.ceph.com> 
> > > > Envoyé: Mercredi 4 Mars 2015 12:10:30 
> > > > Objet: [ceph-users] Perf problem after upgrade from dumpling to firefly 
> > > > 
> > > > Hi, 
> > > > 
> > > > last saturday I upgraded my production cluster from dumpling to emperor 
> > > > (since we were successfully using it on a test cluster). 
> > > > A couple of hours later, we had falling OSD : some of them were marked 
> > > > as down by Ceph, probably because of IO starvation. I marked the 
> > > > cluster 
> > > > in «noout», start downed OSD, then let him recover. 24h later, same 
> > > > problem (near same hour). 
> > > > 
> > > > So, I choose to directly upgrade to firefly, which is maintained. 
> > > > Things are better, but the cluster is slower than with dumpling. 
> > > > 
> > > > The main problem seems that OSD have twice more write operations par 
> > > > second : 
> > > > https://daevel.fr/img/firefly/firefly-upgrade-OSD70-IO.png 
> > > > https://daevel.fr/img/firefly/firefly-upgrade-OSD71-IO.png 
> > > > 
> > > > But journal doesn't change (SSD dedicated to OSD70+71+72) : 
> > > > https://daevel.fr/img/firefly/firefly-upgrade-OSD70+71-journal.png 
> > > > 
> > > > Neither node bandwidth : 
> > > > https://daevel.fr/img/firefly/firefly-upgrade-dragan-bandwidth.png 
> > > > 
> > > > Or whole cluster IO activity : 
> > > > https://daevel.fr/img/firefly/firefly-upgrade-cluster-IO.png 
> > > > 
> > > > Some background : 
> > > > The cluster is splitted in pools with «full SSD» OSD and «HDD+SSD 
> > > > journal» OSD. Only «HDD+SSD» OSD seems to be affected. 
> > > > 
> > > > I have 9 OSD on «HDD+SSD» node, 9 HDD and 3 SSD, and only 3 «HDD+SSD» 
> > > > nodes (so a total of 27 «HDD+SSD» OSD). 
> > > > 
> > > > The IO peak between 03h00 and 09h00 corresponds to snapshot rotation (= 
> > > > «rbd snap rm» operations). 
> > > > osd_snap_trim_sleep is setup to 0.8 since monthes. 
> > > > Yesterday I tried to reduce osd_pg_max_concurrent_snap_trims to 1. It 
> > > > doesn't seem to really help. 
> > > > 
> > > > The only thing which seems to help, is to reduce osd_disk_threads from 
> > > > 8 
> > > > to 1. 
> > > > 
> > > > So. Any idea about what's happening ? 
> > > > 
> > > > Thanks for any help, 
> > > > Olivier 
> > > > 
> > > > _______________________________________________ 
> > > > ceph-users mailing list 
> > > > ceph-users@lists.ceph.com 
> > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> > > > 
> > > 
> > 
> 

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to