Re: [Aoetools-discuss] Comparing LW16800, vblade, and iSCSI

kelsey hudson Sat, 11 Apr 2009 08:50:25 -0700

er krishna wrote:
> You are using dd to test it . I will suggest that ratehr using dd,


There's nothing wrong with dd to test a disk if the blocksize and data 
subset is sufficiently large. In this case, he's using 1M size blocks, 
which is sufficient to cause multi-block writes. It's also an exact 
multiple of the page size (4096b), which is also important to avoid 
cache backfill penalty. 1G data size, on the other hand, should probably 
be much larger (at least two times the size of the physical RAM of both 
the target *and* initiator combined, as both do their own caching -- 
yes, even with dropping the VM caches).

> hdparam , iostat go for some tools like fio,bonnie,iozone & iometer. 

hdparm isn't an effective, real-world test. the others, on the other 
hand, are much better choices as they stress other areas of the I/O 
subsystem and the filesystem upon the disk, which is equally important.

> Please once do your experiment with any of this tool and let us know the 
> results. I don't think iSCSI can give better throughput, because it has 
> tcp/ip overhead.

Not necesarily. The TCP/IP overhead of iSCSI is negligible in most 
cases, and, in fact, not a factor at all when the underlying interface 
isn't saturated. A single disk of this make/model (or pair of disks) 
can't generate enough sustained data to saturate gigabit ethernet. The 
TCP/IP overhead of iSCSI will increase the latency ever-so-slightly, but 
again, when we're talking about these types of disks, (and, in fact, 
most mechanical storage) such a small increase in latency is, in fact, 
negligible. This is especially so if the target and initiator both 
utilize jumbo frames, which are absolutely necessary for AoE to perform 
well. If jumbo frames aren't in use with AoE, then writes can never be a 
multiple of the page size, which will create cache backfill penalty 
reads on the initiator. iSCSI, being a higher-level protocol with 
partial write packet ordering, is immune to this because larger writes 
are broken up into smaller packets which are under the MTU of the 
physical interface, then reassembled on the target as a large block. As 
modern disks benefit greatly from an increased read/write blocksize, 
iSCSI should, in theory, afford better performance than AoE if you can 
find the "sweet spot" -- the block size at which the disk performs best. 
As you begin to saturate the interface, though, AoE should afford better 
performance as its overhead is smaller, given block writes of the same 
size (under the MTU threshold of AoE). This, in my opinion, is one of 
the biggest limitations of AoE: the inability to make greater than an 
~8K read/write because of the reliance on Ethernet frames as a hard 
packet size.


> On Sat, Apr 11, 2009 at 4:34 AM, Billy Crook <billycr...@gmail.com 
>     I wrote a small script to compare a LW16800 (e0.0) and vblade (e9.0)
>     and iscsi to the same disk.  It reads/writes a GB from two WDC
>     WD1001FALS-00J7B0 drives ten times.  Both drives are sitting out of
>     chassis on the same shelf next to eachother, with direct cooling.  I
>     drop_caches on the initiator and on the vblade and iscsi target just
>     before each test.
> 
>     iscsi and vblade export the same disk, /sdb.  Both the initiator and
>     target are otherwise idle when performing these tests.  The initiator,
>     and targets are connected to the same gigabit switch.  Here is my
>     script.

>     [r...@zero ~]# ./aoebenchtest
>     reading 1024M on /dev/etherd/e0.0 10 times
>     1073741824 bytes (1.1 GB) copied, 24.8476 s, 43.2 MB/s
[..snip similar results..]
>     writing 1024M on /dev/etherd/e0.0 10 times
>     1073741824 bytes (1.1 GB) copied, 20.6834 s, 51.9 MB/s
[..snip similar results..]

Whoa there cowboy! How is it that your *write* speed is greater than 
your *read* speed? This seems counter-intuitive, like a caching layer is 
getting in the way. Try bumping your working set size up considerably, 
or open the device in O_DIRECT mode to avoid caching on the target.

>     reading 1024M on /dev/etherd/e9.0 10 times
>     1073741824 bytes (1.1 GB) copied, 20.5144 s, 52.3 MB/s
[..snip similar results..]
>     writing 1024M on /dev/etherd/e9.0 10 times
>     1073741824 bytes (1.1 GB) copied, 58.7555 s, 18.3 MB/s
[..snip similar results..]

You're not using jumbo frames, are you? You should be.... It makes the 
above result make more sense, as I'm guessing the LW16800 will re-block 
writes to avoid the caching penalty. I, admittedly, have no real 
knowledge of the device's capabilities, however ...

>     reading 1024M on
>     /dev/disk/by-path/ip-192.168.171.180:3260-iscsi-testingiscsi-lun-1 10
>     times
>     1073741824 bytes (1.1 GB) copied, 21.4272 s, 50.1 MB/s
[..snip similar results..]
>     writing 1024M on
>     /dev/disk/by-path/ip-192.168.171.180:3260-iscsi-testingiscsi-lun-1 10
>     times
>     1073741824 bytes (1.1 GB) copied, 17.38 s, 61.8 MB/s
[..snip similar results..]

Again, I find this curious.

You might want to alter your script such that it spawns the commands 
that access the device, and sync buffers to disk in a 'time'd subshell. 
Do the octets per second calculation yourself based on that figure.

>     It looks like iSCSI is outperforming both vblade, and the hardware AoE
>     board with the exception of a 5MB/s gain vblade has over iscsi on the
>     same disk in the same target host.  Is there some flaw in my testing?
>     Can I improve AoE performance somehow?  What could make writes so slow
>     with vblade?  I'll switch the drives around and test again just to
>     make sure one of them isn't slower than the other.

There are a few things you can do to improve AoE performance.

First and foremost, enable jumbo frames throughout. This will have the 
biggest impact on performance for both AoE (especially) and iSCSI. If 
you don't have a switch capable of jumbo frames, buy one. :)

Then, set the following sysctls to increase buffer space on raw sockets:

net.core.wmem_max = 262144
net.core.rmem_max = 262144
net.core.wmem_default = 262144
net.core.rmem_default = 262144

Next, re-run your tests with the modifications to the working size 
suggested above, and run the tests with varying blocksizes. Start out at 
512b, then increase exponentially (1024, 2048, 4096, 8192, 16384, 32768, 
65536, etc.) to see where you get the best performance.

You'll see that write performance for both protocols on the Linux target 
will abysmal below 4k blocks. I expect AoE to perform well with 4k and 
8k blcoks, but not get any better with larger blocks (as they have to be 
broken up anyways). iSCSI, on the other hand, should increase in 
performance steadily to 32k or 64k, depending on how the drive itself is 
optimized. The only outlier here is the hardware AoE device: I'm not 
sure what it does internally in the case of writes, so it will be 
interesting to see what it does.


Regards,

-- 
Kelsey Hudson
Sr. Systems Administrator, DrJays.com
9180 Camino Santa Fe, San Diego, CA 92121
888.437.5297x134 (desk)    619.852.6374 (cell)


------------------------------------------------------------------------------
This SF.net email is sponsored by:
High Quality Requirements in a Collaborative Environment.
Download a free trial of Rational Requirements Composer Now!
http://p.sf.net/sfu/www-ibm-com
_______________________________________________
Aoetools-discuss mailing list
Aoetools-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/aoetools-discuss

Re: [Aoetools-discuss] Comparing LW16800, vblade, and iSCSI

Reply via email to