Hi all, I notice there is use of a network congestion algorithm in the AOE driver which tries to limit the number of outstanding requests on the wire. An implementation like this makes a lot of sense for congested networks and cases where more than one host may be accessing a particular target.
However, I am seeing a lot of cases (depending on the load) where this algorithm is being confused by the time it takes for the response to come back from the target. Since there is network-level ACK in the AOE protocol, the actual time it takes for a response to come back depends entirely on how long it gets wedged in the target buffer queues, disk speed, etc. A request that happens to hit cache can come back very quickly, while other requests can be quite slow. I am not sure how to fix this. I wrote a simple patch to the Linux AOE driver to specify "aoe_minout", and saw a little bit of a concurrent latency improvement, but overall throughput wasn't noticeably changed (based on not so accurate munin graphs). However, with this or the stock driver, /dev/etherd/err shows a lot of "retransmit" and matching "unexpected rsp" errors, which are probably not helping performance if this is actually resulting in multiple rewrites to disk. It would seem to me that the best solution to this would be to actually have ACKs in the protocol, but that would be a pretty drastic change. Hmm... Comments? Ideas? Simon- Example /dev/ethed/err snippet, which scrolls at about 15 lines per second during a typical backup run: retransmit e6.1 oldtag=3ac07...@12332781a newtag=3cae781a s=0015c5e92481 d=003048d6026e nout=25 retransmit e6.1 oldtag=3ac37...@12332781a newtag=3caf781a s=0015c5e92481 d=003048d6026e nout=26 retransmit e6.1 oldtag=3ac17...@12332781a newtag=3cb0781a s=0015c5e92481 d=003048d6026e nout=26 retransmit e6.1 oldtag=3ac47...@12332781a newtag=3cb1781a s=0015c5e92481 d=003048d6026e nout=27 retransmit e6.1 oldtag=3ac27...@12332781b newtag=3cb2781b s=0015c5e92481 d=003048d6026e nout=27 retransmit e6.1 oldtag=3ac57...@12332781b newtag=3cb3781b s=0015c5e92481 d=003048d6026e nout=28 retransmit e6.1 oldtag=3cb07...@1233278fc newtag=3d9b78fc s=0015c5e92481 d=003048d6026e nout=15 retransmit e6.1 oldtag=3cb37...@1233278fc newtag=3d9c78fc s=0015c5e92481 d=003048d6026e nout=16 retransmit e6.1 oldtag=3cae7...@123327906 newtag=3d9d7906 s=0015c5e92481 d=003048d6026e nout=16 retransmit e6.1 oldtag=3cb17...@123327906 newtag=3d9e7906 s=0015c5e92481 d=003048d6026e nout=17 retransmit e6.1 oldtag=3caf7...@123327906 newtag=3d9f7906 s=0015c5e92481 d=003048d6026e nout=17 retransmit e6.1 oldtag=3cb27...@123327912 newtag=3da07912 s=0015c5e92481 d=003048d6026e nout=17 unexpected rsp e6.1 tag=3ac47...@1233279b3 s=003048d6026e d=0015c5e92481 unexpected rsp e6.1 tag=3ac37...@1233279b3 s=003048d6026e d=0015c5e92481 unexpected rsp e6.1 tag=3ac57...@1233279b3 s=003048d6026e d=0015c5e92481 unexpected rsp e6.1 tag=3caf7...@1233279b3 s=003048d6026e d=0015c5e92481 unexpected rsp e6.1 tag=3cb17...@1233279b3 s=003048d6026e d=0015c5e92481 unexpected rsp e6.1 tag=3cb37...@1233279b3 s=003048d6026e d=0015c5e92481 unexpected rsp e6.1 tag=3cae7...@1233279b3 s=003048d6026e d=0015c5e92481 unexpected rsp e6.1 tag=3ac17...@1233279b3 s=003048d6026e d=0015c5e92481 unexpected rsp e6.1 tag=3cb07...@1233279b3 s=003048d6026e d=0015c5e92481 unexpected rsp e6.1 tag=3ac07...@1233279b3 s=003048d6026e d=0015c5e92481 unexpected rsp e6.1 tag=3ac27...@1233279b3 s=003048d6026e d=0015c5e92481 unexpected rsp e6.1 tag=3cb27...@1233279b3 s=003048d6026e d=0015c5e92481 ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Aoetools-discuss mailing list Aoetools-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/aoetools-discuss