Re: [Aoetools-discuss] vblade-22-rc1 is first release candidate for version 22

2014-07-15 Thread David Linda Leach

Killer,

That confirms some of my suspicion. In my testing I can see requests for 
1024 sectors (512K) of data from the hard drive which the AoE client 
would have to carve up into individual read/write requests that can fit 
into an AoE packet. At the server, each of these requests would appear 
as individual read/write requests of the disk so if you followed the 
optimal packet usage for Ethernet for AoE you would end up with a jumbo 
frame request for 17 sectors. The initial 1024 sector request at the 
client would start on an aligned boundary for the first two 4k sectors 
but then have a trailing 512 byte sector request which will cause the 
next 7 requests to start off unaligned and end aligned... so a 1024 
sector request from the host OS will result in only 1 out of 8 requests 
starting on an aligned boundary.


Since the AoE client driver is handling disk requests from the host OS, 
the host OS is going to assume certain things about the disk and try to 
ensure proper alignment requests. I even think I've seen that if you 
have an application that is going to write to sector 1 that the host 
will read (or page) in the 4k chunk starting at sector 0 and then write 
out the 4k chunk at sector 0 with the modification of sector 1.


It seems like if we wanted to ensure alignment and support this 
configurable max sector count request size that the size we would want 
would be 16 to keep these large requests aligned and to ensure maximum 
efficiency for disk usage at the server. But this goes back to some of 
my original questions:


1) What is the test setup to determine the results of changing the max 
request size?

2) How does one measure latency and responsiveness?

David
--
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck
Code Sight - the same software that powers the world's largest code
search on Ohloh, the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds___
Aoetools-discuss mailing list
Aoetools-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/aoetools-discuss


Re: [Aoetools-discuss] vblade-22-rc1 is first release candidate for version 22

2014-07-15 Thread Ed Cashin
On 07/15/2014 01:29 AM, David Leach wrote:
 Ed,

 I'm less concerned about the initiator side as we don't really have 
 direct control over what it requests. What I suggest is that these 
 requests from the host on the initiator will likely be aligned 
 requests due to how their file system works to try to keep things 
 efficient. If we then cause the resulting AoE requests to the server 
 be unaligned accesses then that will likely cause additional IO 
 transactions to the file system which would then likely cause latency 
 delays on the responses to the these requests.

As long as you don't specify the sync or direct options, though, the 
vblade will write to a buffered backing store.  Then the ultimate 
backing store (e.g., disk drive), the ultimate driver (e.g., SCSI 
layer), the block layer, the middle layer (e.g., dm and md), the VM 
subsystem and (if it's a file) the filesystem will get a chance to merge 
and align I/O.

-- 
   Ed

--
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck
Code Sight - the same software that powers the world's largest code
search on Ohloh, the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds
___
Aoetools-discuss mailing list
Aoetools-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/aoetools-discuss


Re: [Aoetools-discuss] vblade-22-rc1 is first release candidate for version 22

2014-07-14 Thread Killer{R}
Hello David,

Monday, July 14, 2014, 1:23:56 AM, you wrote:

IMHO problem caused mostly not only by page size, but also by HDD's
sector size. Nowadays HDDs has 4K physical sector size. However they
support accessing by 512 bytes, but this is ineffective, cause every
unaligned read access that doesnt fit into 4K sector will resulted
into 4K read, and every unaligned write - will cause disk to read
sector's data, modify it internally in buffers and the write it back.
Sure, firmware tries to do this in fastest way, but my tests shows
that there'is about 20..30% sequential write speed degradation (with
O_DIRECT) on writing 4K blocks if begining of each block is not
aligned to 4K too. So simple using jumbo frames is not enough to make
hardware work as fast as it can.
AoE protocol doesn't support 4K sectors directly, cause it should
support 'normal' MTU, but not only jumbo frames. However its
theoretically possible to make initiator report OS that its '4K
sector drive' and proper ('4K sector aware' :) ) OS will then access
it by 4K-aligned portions, that together with some buffering at
target's side should make it all work faster :). But its all looks
like a tricky workaround.



DL So I do find it interesting to have a configuration to limit the size of
DL the read/write request but it seems like it would be useful to understand
DL the side affects on why someone would want to do this. Catalin suggested
DL that reducing the size of the jumbo frames decreases latency and improves
DL boot-times and said that the system feels more response. This is were I
DL have a problem though because something feeling more responsive is not
DL very satisfying. It would be better to have some hard numbers behind what
DL this change does.

DL AoE using normal Ethernet frames end up having a protocol efficiency of
DL only 89.82% which on a 1Gb Ethernet would give you a theoretical maximum
DL throughput of ~112 MB/s. Going up to a 9000 byte frame bumps the efficiency
DL to 98.68% and a theoretical max throughput of ~123 MB/s. Something
DL interesting about jumbo frames though is that it ends up being able to
DL request 17 sectors of data per request.

DL Why is this interesting? Because on some Linux systems, a page size is 4096
DL or 8 sectors so the 17 sectors works out to 2 full pages plus touching into
DL another page. If you are not using direct IO but instead letting Linux
DL manage the underlying file system then it would seem like you will end up
DL making unaligned IO requests of the system causing additional I/Os to be
DL issued. This might be the reason for the latency affects and it would be
DL interesting to get the numbers that Catalin may have in his tests... I
DL wouldn't mind seeing results for 17, 16, 8 sector count requests.

DL But what I don't understand is that if the throughput is 80 MB/s and drops
DL to 60 MB/s as Catalin suggests then I don't get how a 20 MB/s drop in
DL throughput would make the system be more responsive ... I also don't
DL understand what the test setup would be to even measure the affects of
DL latency, throughput and having it correlate to responsiveness?

DL David
 



-- 
Best regards,
 Killer{R}mailto:supp...@killprog.com


--
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck#174;
Code Sight#153; - the same software that powers the world's largest code
search on Ohloh, the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds
___
Aoetools-discuss mailing list
Aoetools-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/aoetools-discuss


Re: [Aoetools-discuss] vblade-22-rc1 is first release candidate for version 22

2014-07-14 Thread Ed Cashin
On 07/13/2014 06:23 PM, David Leach wrote:
 So I do find it interesting to have a configuration to limit the size 
 of the read/write request but it seems like it would be useful to 
 understand the side affects on why someone would want to do this. 
 Catalin suggested that reducing the size of the jumbo frames decreases 
 latency and improves boot-times and said that the system feels more 
 response. This is were I have a problem though because something 
 feeling more responsive is not very satisfying. It would be better 
 to have some hard numbers behind what this change does.

Yes, I agree.  If Catalin posts the patch here, then perhaps any 
interested parties would be able to gather some data.

[Leach correctly notes that some jumbos carry ...]
 17 sectors of data per request.

There is often a lot going on there.  For example, if the initiator host 
is using a filesystem, then writes will dirty pages of memory that are 
buffering the data from the AoE device.  The virtual memory subsystem 
will flush that data when it gets around to it, using whatever chunks it 
likes, then the block layer will probably consolidate or split the I/O 
as it likes inside the I/O scheduler, and only then will the aoe 
initiator get the data.

But the aoe driver will set up network buffers (sk_buff structures) that 
point right into the memory associated with the I/O.  The network card 
itself often does the transfer from RAM into the card and vice versa.  
I'm not sure there's a significant penalty paid for telling the NIC to 
DMA seventeen sectors.  It would be a good test to do in the aoe driver 
with a few different representative NICs.

Further, on the target side, there's no guarantee that the target will 
do the I/O in exactly the same chunks that appear in the AoE packets.  
Even disk drives have elevator algorithms scheduling I/O from write buffers.

I agree that test results here would be interesting, but a big Your 
Mileage May Vary should accompany the results.

-- 
   Ed

--
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck
Code Sight - the same software that powers the world's largest code
search on Ohloh, the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds
___
Aoetools-discuss mailing list
Aoetools-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/aoetools-discuss


Re: [Aoetools-discuss] vblade-22-rc1 is first release candidate for version 22

2014-07-13 Thread David Leach
So I do find it interesting to have a configuration to limit the size of
the read/write request but it seems like it would be useful to understand
the side affects on why someone would want to do this. Catalin suggested
that reducing the size of the jumbo frames decreases latency and improves
boot-times and said that the system feels more response. This is were I
have a problem though because something feeling more responsive is not
very satisfying. It would be better to have some hard numbers behind what
this change does.

AoE using normal Ethernet frames end up having a protocol efficiency of
only 89.82% which on a 1Gb Ethernet would give you a theoretical maximum
throughput of ~112 MB/s. Going up to a 9000 byte frame bumps the efficiency
to 98.68% and a theoretical max throughput of ~123 MB/s. Something
interesting about jumbo frames though is that it ends up being able to
request 17 sectors of data per request.

Why is this interesting? Because on some Linux systems, a page size is 4096
or 8 sectors so the 17 sectors works out to 2 full pages plus touching into
another page. If you are not using direct IO but instead letting Linux
manage the underlying file system then it would seem like you will end up
making unaligned IO requests of the system causing additional I/Os to be
issued. This might be the reason for the latency affects and it would be
interesting to get the numbers that Catalin may have in his tests... I
wouldn't mind seeing results for 17, 16, 8 sector count requests.

But what I don't understand is that if the throughput is 80 MB/s and drops
to 60 MB/s as Catalin suggests then I don't get how a 20 MB/s drop in
throughput would make the system be more responsive ... I also don't
understand what the test setup would be to even measure the affects of
latency, throughput and having it correlate to responsiveness?

David
--
___
Aoetools-discuss mailing list
Aoetools-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/aoetools-discuss


Re: [Aoetools-discuss] vblade-22-rc1 is first release candidate for version 22

2014-07-11 Thread Ed Cashin
Catalin, Salgau, hi.

Did you send a patch for this packet-size-tuning feature?  It seems like 
it would be a nice patch for contrib/ if you could put a nice 
description at the top, including its motivation and your personal 
experiences.  During testing you might even get a chance to jot down 
some details about specific performance differences, to motivate 
potential users to try the patch.

(I'm composing this in Thunderbird, and I hope I'm not going to send 
HTML mail!)

On 06/14/2014 11:59 PM, Catalin Salgau wrote:
 On 15/06/2014 4:06 AM, Ed Cashin wrote:
 Hi, Catalin Salgau.

 I have questions below between selected quotes.

 On 2014-06-10 07:59, Catalin Salgau wrote:
 I would like to request two changes before release.
 - An option to restrict the size of packets over automatic detection of
 MTU.
 You mean like if the MTU is 9000, you want the ability to tell the
 vblade to act like it's smaller, right?
 Yes. That's the gist of it.
 I believe there is some value in the ability to manually tweak the
 maximum packet size used by vlade.
 At the very least it would help with determining optimal parameters for
 a deployment/use case.
 If you have some numbers to share (MTUs and packet sizes as well as
 throughput rates and latencies), that would fill out your interesting
 story with important details.
 I sadly made no effort to document it, and, in retrospect, it might have
 made for an interesting study..
 As 'methodology', the target was configured to support 9014 byte frames
 and the initiator was switched between the two tested packet sizes on
 Windows XP x86 and Windows 7 amd64.
 Due to some problems with making these changes to multiple test images,
 I haven't replicated the results over a larger set of machines.
 I'm hoping to get back to this in a week or two.
 ...
 - change to vblade(8) manual Synopsis section to include current syntax
 That change might be simple enough for a patch to be easier for me to
 understand than a description, so please send the patch if you don't
 mind.
 This was probably poorly worded.
 What I was trying to say was that the Synopsis section in the manual
 page supplied with vblade has not been kept in sync with options added
 to vblade.
 Since changing this as suggested below would yield a line longer than 80
 chars, I'm not providing a real patch; I don't know how one should
 format this. Use it as reference, maybe?

 --- a/vblade.8
 +++ b/vblade.8
 @@ -6,1 +6,1 @@
 -.B vblade [ -m mac[,mac...] ] shelf slot netif filename
 +.B vblade [-b bufcnt] [-o offset] [-l length] [-dsr] [ -m mac[,mac...]
 ] shelf slot netif filename

 --
 HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
 Find What Matters Most in Your Big Data with HPCC Systems
 Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
 Leverages Graph Analysis for Fast Processing  Easy Data Exploration
 http://p.sf.net/sfu/hpccsystems
 ___
 Aoetools-discuss mailing list
 Aoetools-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/aoetools-discuss

-- 
   Ed


--
___
Aoetools-discuss mailing list
Aoetools-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/aoetools-discuss


Re: [Aoetools-discuss] vblade-22-rc1 is first release candidate for version 22

2014-07-09 Thread Ed Cashin
On 2014-07-07 20:18, Ed Cashin wrote:
 On 2014-06-15 07:08, Catalin Salgau wrote:
 ...
 Legacy note:
 Negotiation is, I believe, a reminiscence from before the Query Config
 Information 'Sector Count' field was added, in AoEr9.
 But I would argue that this was invalid behaviour in AoEr8 (and maybe
 previously. I was unable to find previous revisions. @Ed some help
 here?)
 
 I'm working on this.

I think that 8 is the first publicly released revision.

-- 
   Ed Cashin ed.cas...@acm.org

--
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft
___
Aoetools-discuss mailing list
Aoetools-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/aoetools-discuss


Re: [Aoetools-discuss] vblade-22-rc1 is first release candidate for version 22

2014-06-15 Thread Catalin Salgau
On 15/06/2014 2:48 PM, Killer{R} wrote:
 Hello Catalin,

 Sunday, June 15, 2014, 6:59:54 AM, you wrote:


 I would like to request two changes before release.
 - An option to restrict the size of packets over automatic detection of
 MTU.
 You mean like if the MTU is 9000, you want the ability to tell the
 vblade to act like it's smaller, right?
 CS Yes. That's the gist of it.
 CS I believe there is some value in the ability to manually tweak the
 CS maximum packet size used by vlade.
 But its all to initiator side. Actually for example WinAoE (and its
 forks ;) ) does MTU 'autodetection' instead of using Conf::scnt.

That's not entirely correct.
WinAoE indeed does a form of negotiation there - it will start at 
(MTU/sector size) and will do reads of decreasing size, until it 
receives a valid packet.
However! If you would kindly check ata.c:157 (on v22-rc1) any ATA 
request for more than the supported packet size will be refused.

Legacy note:
Negotiation is, I believe, a reminiscence from before the Query Config 
Information 'Sector Count' field was added, in AoEr9.
But I would argue that this was invalid behaviour in AoEr8 (and maybe 
previously. I was unable to find previous revisions. @Ed some help here?)
While there was no outright ban on it up-front (it was only textually 
explained that a standard 1520 byte MTU would limit device ATA commands 
to two sectors, and servers were not required to understand emitted ATA 
commands), the ATA 'Sector Count' is explicitly constrained to 0, 1 or 2.
This effectively makes negotiating larger sector counts invalid. Both in 
method and in result.
The code does, however, have the nice side effect of Path MTU Detection, 
if, for example, your networking equipment is not configured or capable 
of handling jumbo frames (or, who knows, some weird networking equipment 
that takes RFC791 at the minimum, and has a maximum supported datagram 
size as low as 68 bytes. :) )

Cheers!

--
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing  Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
___
Aoetools-discuss mailing list
Aoetools-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/aoetools-discuss


Re: [Aoetools-discuss] vblade-22-rc1 is first release candidate for version 22

2014-06-15 Thread Killer{R}
Hello Catalin,

Sunday, June 15, 2014, 4:08:15 PM, you wrote:

 I would like to request two changes before release.
 - An option to restrict the size of packets over automatic detection of
 MTU.
 You mean like if the MTU is 9000, you want the ability to tell the
 vblade to act like it's smaller, right?
 CS Yes. That's the gist of it.
 CS I believe there is some value in the ability to manually tweak the
 CS maximum packet size used by vlade.
 But its all to initiator side. Actually for example WinAoE (and its
 forks ;) ) does MTU 'autodetection' instead of using Conf::scnt.

CS That's not entirely correct.
CS WinAoE indeed does a form of negotiation there - it will start at 
CS (MTU/sector size) and will do reads of decreasing size, until it 
CS receives a valid packet.
CS However! If you would kindly check ata.c:157 (on v22-rc1) any ATA 
CS request for more than the supported packet size will be refused.

That's also not entirely correct :) It increases sectors count from
1 to ether MTU limit, either any kind of error from target, including
timeout.
However in my investigation I found that its usefull for initiator to
know also value called in vblade as 'buffers count' .. I mean such
a count of packets initiator can send to target knowing that it will
likely process them all. Because sending more request than this value
as 'outstanding' sharply increases drops (and resends) rate.
I implemented also kind of negotiation to detect this by sending
'congestion' extension command that does usleep(50) and the
responds for all commands received in buffer. Such approach by
comparing with directly asking target for buffers count will
detect also any implicit buffering between initiator and target




-- 
Best regards,
 Killer{R}mailto:supp...@killprog.com


--
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing  Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
___
Aoetools-discuss mailing list
Aoetools-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/aoetools-discuss


Re: [Aoetools-discuss] vblade-22-rc1 is first release candidate for version 22

2014-06-15 Thread Catalin Salgau
Hi again!
I like my long emails, don't I?

On 15/06/2014 4:22 PM, Killer{R} wrote:
 Hello Catalin,

 Sunday, June 15, 2014, 4:08:15 PM, you wrote:

 I would like to request two changes before release.
 - An option to restrict the size of packets over automatic detection of
 MTU.
 You mean like if the MTU is 9000, you want the ability to tell the
 vblade to act like it's smaller, right?
 CS Yes. That's the gist of it.
 CS I believe there is some value in the ability to manually tweak the
 CS maximum packet size used by vlade.
 But its all to initiator side. Actually for example WinAoE (and its
 forks ;) ) does MTU 'autodetection' instead of using Conf::scnt.

 CS That's not entirely correct.
 CS WinAoE indeed does a form of negotiation there - it will start at
 CS (MTU/sector size) and will do reads of decreasing size, until it
 CS receives a valid packet.
 CS However! If you would kindly check ata.c:157 (on v22-rc1) any ATA
 CS request for more than the supported packet size will be refused.

 That's also not entirely correct :) It increases sectors count from
 1 to ether MTU limit, either any kind of error from target, including
 timeout.
You're probably right there. I haven't looked at it recently. In any 
event, the observation stands.
Changing the supported MTU in vblade will limit packets to that size (I 
wouldn't have bothered with the FreeBSD MTU detection code if that 
wasn't the case)
 However in my investigation I found that its usefull for initiator to
 know also value called in vblade as 'buffers count' .. I mean such
 a count of packets initiator can send to target knowing that it will
 likely process them all. Because sending more request than this value
 as 'outstanding' sharply increases drops (and resends) rate.
 I implemented also kind of negotiation to detect this by sending
 'congestion' extension command that does usleep(50) and the
 responds for all commands received in buffer. Such approach by
 comparing with directly asking target for buffers count will
 detect also any implicit buffering between initiator and target

As per the AoE spec, messages in excess of Buffer Count are dropped.
Since vblade processes these synchronously, this happens at the network 
buffer level. If using async I/O, you're responsible for that, in theory.
As far as I remember, WinAoE not only doesn't care about that, but 
doesn't even request this information from the target.
Should WinAoE limit the number of floating packets, as the target says 
it should, we wouldn't actually be talking about that, but that would 
probably cause more latency, since the initiator would have to wait for 
confirmation for at least one packet before sending another one in 
excess of bufcnt (and as I remember, WinAoE does not apply limits to 
sending packets)
This would probably reduce throughput and increase average response time 
under even moderate load, but decrease drop-rate.
I'm not actually sure that the drop/resend rate is something to aim for. 
It's clearly desirable to minimise these, but not for the sake of the 
number.

Regarding your proposed extension, I could see something like this being 
valuable in the event that the target can detect increased drop-rate and 
inform the initiator to ease-off or resend packets faster than the 
default timeout, but I since the target is not allowed to send 
unsolicited packets to the initiator, a specific request would be 
needed(say when a large number of packets are outstanding), but this 
would raise the question - if those packets are being dropped, what is 
there to stop the target's network stack from dropping our congestion 
detection packet?
On that note, vblade could be thought to broadcast load status 
periodically or on high-drop rate, and in initiators would notice that 
and adapt, but I believe that this raises some security concerns and 
also would slightly slow the target since it would need to yield to the 
kernel for the drop-rate information every few requests.

@Ed
Now, thanks to Killer's reference to the Buffer Count, I remember that 
the freebsd code does not actually use it to allocate the network buffers.
Under Linux, following setsockopt with the default bufcnt, the receive 
buffer would end up 24000 bytes long for an MTU of 1500 bytes, and 
144000 for a 9K MTU.
Under FreeBSD the code defaults to a 64K buffer. That makes bufcnt 43 on 
an 1500 byte MTU, but 7 on a 9K MTU.
This could cause an increase in dropped packets and explain the decrease 
in throughput I mentioned in a previous mail. I did not check for this 
when testing.
I was not concerned because multiple instances of vblade on the same 
interface would saturate the channel anyway, but now I'm starting to worry:)


--
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages 

Re: [Aoetools-discuss] vblade-22-rc1 is first release candidate for version 22

2014-06-15 Thread Catalin Salgau
On 15/06/2014 6:40 PM, Killer{R} wrote:
 Hello Catalin,

 Sunday, June 15, 2014, 6:30:09 PM, you wrote:

 CS Hi again!
 CS I like my long emails, don't I?
 Yep :)

 About drop rate - its not my pure-theoretical assumption that drop
 rate must be minimized. I played with WinAOE variable named as
 OutstandingThreshold and found that IO performance best when it
 nearby buffers count specified in vblade's command line (or in case of
 FreeBSD - that value that actually means buffered packets count).
 Also there is other yet theoretical for me aspect - if there're lot of
 AoE targets/initiators sharing same wire it definately better to has
 lower possible resend rate.
That feature was not present in the old build, regrettably.
I'll post back on the drop-rate when I have a chance to test this on 
Monday maybe.
I was actually looking at adding more links or switching to 10G to get 
around this.

 CS On 15/06/2014 4:22 PM, Killer{R} wrote:
 Hello Catalin,

 Sunday, June 15, 2014, 4:08:15 PM, you wrote:

 I would like to request two changes before release.
 - An option to restrict the size of packets over automatic detection of
 MTU.
 You mean like if the MTU is 9000, you want the ability to tell the
 vblade to act like it's smaller, right?
 CS Yes. That's the gist of it.
 CS I believe there is some value in the ability to manually tweak the
 CS maximum packet size used by vlade.
 But its all to initiator side. Actually for example WinAoE (and its
 forks ;) ) does MTU 'autodetection' instead of using Conf::scnt.

 CS That's not entirely correct.
 CS WinAoE indeed does a form of negotiation there - it will start at
 CS (MTU/sector size) and will do reads of decreasing size, until it
 CS receives a valid packet.
 CS However! If you would kindly check ata.c:157 (on v22-rc1) any ATA
 CS request for more than the supported packet size will be refused.

 That's also not entirely correct :) It increases sectors count from
 1 to ether MTU limit, either any kind of error from target, including
 timeout.
 CS You're probably right there. I haven't looked at it recently. In any
 CS event, the observation stands.
 CS Changing the supported MTU in vblade will limit packets to that size (I
 CS wouldn't have bothered with the FreeBSD MTU detection code if that
 CS wasn't the case)
 However in my investigation I found that its usefull for initiator to
 know also value called in vblade as 'buffers count' .. I mean such
 a count of packets initiator can send to target knowing that it will
 likely process them all. Because sending more request than this value
 as 'outstanding' sharply increases drops (and resends) rate.
 I implemented also kind of negotiation to detect this by sending
 'congestion' extension command that does usleep(50) and the
 responds for all commands received in buffer. Such approach by
 comparing with directly asking target for buffers count will
 detect also any implicit buffering between initiator and target

 CS As per the AoE spec, messages in excess of Buffer Count are dropped.
 CS Since vblade processes these synchronously, this happens at the network
 CS buffer level. If using async I/O, you're responsible for that, in theory.
 CS As far as I remember, WinAoE not only doesn't care about that, but
 CS doesn't even request this information from the target.
 CS Should WinAoE limit the number of floating packets, as the target says
 CS it should, we wouldn't actually be talking about that, but that would
 CS probably cause more latency, since the initiator would have to wait for
 CS confirmation for at least one packet before sending another one in
 CS excess of bufcnt (and as I remember, WinAoE does not apply limits to
 CS sending packets)
 CS This would probably reduce throughput and increase average response time
 CS under even moderate load, but decrease drop-rate.
 CS I'm not actually sure that the drop/resend rate is something to aim for.
 CS It's clearly desirable to minimise these, but not for the sake of the
 CS number.

 CS Regarding your proposed extension, I could see something like this being
 CS valuable in the event that the target can detect increased drop-rate and
 CS inform the initiator to ease-off or resend packets faster than the
 CS default timeout, but I since the target is not allowed to send
 CS unsolicited packets to the initiator, a specific request would be
 CS needed(say when a large number of packets are outstanding), but this
 CS would raise the question - if those packets are being dropped, what is
 CS there to stop the target's network stack from dropping our congestion
 CS detection packet?
 CS On that note, vblade could be thought to broadcast load status
 CS periodically or on high-drop rate, and in initiators would notice that
 CS and adapt, but I believe that this raises some security concerns and
 CS also would slightly slow the target since it would need to yield to the
 CS kernel for the drop-rate information every few requests.

 CS @Ed
 CS Now, thanks to Killer's