My .02 below...

-----Original Message-----
From: Ed Cashin [mailto:ecas...@coraid.com] 
Sent: Friday, September 6, 2013 10:11 AM
To: Derick Swanepoel
Cc: aoetools-discuss@lists.sourceforge.net
Subject: Re: [Aoetools-discuss] Poor performance on 10 Gbps SAN

> That said, while checking the vblade README for the design goals, I noticed 
> that it advertises
> a capacity for 16 outstanding commands.  If you want to try some tuning, you 
> could adjust
> Bufcount in dat.h and then make sure your settings in /proc are sufficient to 
> allow the kernel
> to buffer 16 writes.  (Read commands are small.)

Back when I was tuning AoE for our virtualization project I learned the hard 
way to be careful with the aoe_maxout parameter of the Linux driver.  E.g. if 
you have a single target and single initiator, 16 may be appropriate, but if 
you have 4 hosts sharing one target, you can quickly overrun the command 
buffers if all hosts are doing I/O at once.  I settled on aoe_maxout="8" as a 
compromise for stability and performance.

Long command queues can wreak havoc with the Linux aoe driver RTT calculations 
too, leading to unnecessary retransmits (it's inevitable that average 
round-trip times go up as the queue length grows past the point of your array's 
ability to perform I/O in parallel).  Retransmits will of course lower your 
throughput and decrease the efficiency of your network.  With hardware flow 
control, we found that very few packets are completely lost, if ever.  The most 
likely scenario for losing commands is to send more to the target than it can 
queue at once, such as by overrunning the kernel socket buffers.

As you're testing the stack, pay close attention to network statistics as well 
as block statistics.  I also found the "debug" output of the aoe driver useful, 
e.g.:

# pwd
/sys/block/etherd!e1.0

# cat debug
rttavg: 58042 rttdev: 58472
nskbpool: 2
kicked: 868170
maxbcnt: 8704
ref: 0
falloc: 80
ffree: ffff88005c36ad80
003048b96515:1:8:8
        ssthresh:4
        lost:4133967
        taint:0
        r:1859180367
        w:643159042
        eth5
falloc: 82
ffree: ffff8800630d9a80
003048b96514:3:8:8
        ssthresh:4
        lost:4122252
        taint:0
        r:1863836699
        w:655038947
        eth4

The driver source code is small and easy to read, and explains what each of 
these measurements mean.  (In this example we have a pair of 1GB links 
splitting the load.  We've reached ~180MB/s on sequential operations.  Our aoe 
driver is v7.5, current back at the time.)

-Jeff



------------------------------------------------------------------------------
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58041391&iu=/4140/ostg.clktrk
_______________________________________________
Aoetools-discuss mailing list
Aoetools-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/aoetools-discuss

Reply via email to