On Wed, 15 May 2013, Chen, Xiaoxi wrote:

> >How responsive generally is the machine under load?  Is there available CPU?
>       The machine works well, and the issued OSDs are likely the same, seems 
> because they have relative slower disk( disk type are the same but the 
> latency is a bit higher ,8ms -> 10ms).
>       
>       Top show no idle % but still have 30+% of io_wait,  my colleague 
> educate me that io_wait can be treated as free.
> 
> Another information is offload the heartbeat to 1Gb nic doesn't solve 
> the problem, what's more, when we doing random write test, we can still 
> see this flipping happen. So I would like to say it may related with CPU 
> scheduler ? The heartbeat thread (in busy OSD ) failed to get enough cpu 
> cycle.

Can you try reproducing with logs?  If we can reproduce that will give us 
some clues.

sage

> 
> -----Original Message-----
> From: [email protected] 
> [mailto:[email protected]] On Behalf Of Sage Weil
> Sent: 2013?5?15? 7:23
> To: Chen, Xiaoxi
> Cc: Mark Nelson; [email protected]; [email protected]
> Subject: Re: [ceph-users] OSD state flipping when cluster-network in high 
> utilization
> 
> On Tue, 14 May 2013, Chen, Xiaoxi wrote:
> > I like the idea to leave ping in cluster network because it can help 
> > us detect switch?nic failure.
> > 
> > What confuse me is I keep pinging every ceph node's cluster ip?it is 
> > OK during the whole run with less than 1 ms latency?why the heartbeat 
> > still suffer? TOP show my cpu not 100% utilized?with ?30% io 
> > wait?.Enabling jumbo frame **seems** make things worth.(just 
> > feeling.no data supports)
> 
> I say "ping" in teh general sense.. it's not using ICMP, but sending small 
> messages over a TCP session, doing some minimal processing on the other end, 
> and sending them back.  If the machine is heavily loaded and that thread 
> doesn't get scheduled or somehow blocks, it may be problematic.
> 
> How responsive generally is the machine under load?  Is there available CPU?
> 
> We can try to enable debugging to see what is going on.. 'debug ms = 1' 
> and 'debug osd = 20' is everything we would need, but will incur additoinal 
> load itself and may spoil the experiment...
> 
> sage
> 
> > 
> > ???? iPhone
> > 
> > ? 2013-5-14?23:36?"Mark Nelson" <[email protected]> ???
> > 
> > > On 05/14/2013 10:30 AM, Sage Weil wrote:
> > >> On Tue, 14 May 2013, Chen, Xiaoxi wrote:
> > >>> 
> > >>> Hi
> > >>> 
> > >>>   We are suffering our OSD flipping between up and down ( OSD X be 
> > >>> voted to down due to 3 missing ping, and after a while it tells 
> > >>> the monitor ?map xxx wrongly mark me down? ). Because we are 
> > >>> running sequential write performance test on top of RBDs, and the 
> > >>> cluster network nics is really in high utilization (8Gb/s+ for a 10Gb 
> > >>> network).
> > >>> 
> > >>>          Is this a expected behavior ? or how can I prevent this happen?
> > >> 
> > >> You an increase the heartbeat grace period.  The pings are handled 
> > >> by a separate thread on the backside interface (if there is one).  
> > >> If you are missing pings then the network or scheduler is 
> > >> preventing those (small) messages from being processed (there is 
> > >> almost no lock contention in that path).  Which means it really is 
> > >> taking ~20 seconds or wahtever to handle those messages.  It's 
> > >> really a questin of how unresponsive you want to permit the OSDs to be 
> > >> before you consider it a failure..
> > >> 
> > >> sage
> > >> 
> > >> 
> > > 
> > > It might be worth testing out how long pings or other network traffic are 
> > > taking during these tests.  There may be some tcp tunning you can do 
> > > here, or even consider using a separate network for the mons.
> > > 
> > > Mark
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the 
> body of a message to [email protected] More majordomo info at  
> http://vger.kernel.org/majordomo-info.html
> N????y????b?????v?????{.n??????z??ay????????j???f????????????????:+v??????????zZ+??????"?!?
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to