On Sun, Sep 23, 2012 at 2:33 PM, Stefan Fuhrmann <stefan.fuhrm...@wandisco.com> wrote: > On Sat, Sep 22, 2012 at 7:13 PM, Johan Corveleyn <jcor...@gmail.com> wrote: >> >> On Sat, Sep 22, 2012 at 2:27 PM, <stef...@apache.org> wrote: >> > Author: stefan2 >> > Date: Sat Sep 22 12:27:49 2012 >> > New Revision: 1388786 >> > >> > URL: http://svn.apache.org/viewvc?rev=1388786&view=rev >> > Log: >> > On the 10Gb branch. >> > >> > * BRANCH-README: clarify goals and impact of this branch >> > >> > Modified: >> > subversion/branches/10Gb/BRANCH-README >> > >> > Modified: subversion/branches/10Gb/BRANCH-README >> > URL: >> > http://svn.apache.org/viewvc/subversion/branches/10Gb/BRANCH-README?rev=1388786&r1=1388785&r2=1388786&view=diff >> > >> > ============================================================================== >> > --- subversion/branches/10Gb/BRANCH-README (original) >> > +++ subversion/branches/10Gb/BRANCH-README Sat Sep 22 12:27:49 2012 >> > @@ -3,13 +3,19 @@ svn:// single-threaded throughput from a >> > 10Gb/s for typical source code, i.e. becomes capable of >> > saturating a 10Gb connection. >> > >> > +Http:// will speep up by almost the same absolute value, >> > +1 second being saved per GB of data. Due to slow processing >> > +in other places, this gain will be hard to measure, though. >> >> Heh, next question: what are those "slow places" mainly, and do you >> have any ideas to speed those up as well? Are there (even only >> theoretical) possibilities here? Or would that require major >> revamping? Or is it simply theoretically not possible to overcome >> certain bottlenecks? > > > It is not entirely clear, yet, where that overhead comes from. > However, > > * the textual representation is not a problem - there is no > significant data overhead in HTTP. Base64 encoding has > been limiting in the past and may certainly be tuned much > more if need be. > * IIRC, we use the same reporter on the same granularity, > the server pushes a whole file tree out to the client with no > need for extra roundtrips. But I may be mistaken here.
With 1.8 there will only be ra_serf for http, and that does a separate http GET for every file during checkout/update. These requests can go in parallel. In most setups, with KeepAlive enabled, TCP connections will be reused, but still there will be a certain overhead for every http request/response. There is no giant streaming response with an entire tree. > Possible sources for extra load: > > * Apache modules packing / unpacking / processing > the outgoing data (HTTP/XML tree?) > * Apache access control modules - even if there is > blanket access > * Fine-grained network communication. > > The latter two are a problem because we want to transmit > 40k files + properties per second. > > My gut feeling is that we can address much of the issues > that we will find and doubling the performance is virtually > always possible. A stateless protocol like HTTP also > makes it relatively easy to create parallel request streams > to increase throughput. > > Another thing is that svnserve would be just fine for many > use-cases if only it had decent SSPI / ldap support. But > that is something we simply need to code. Power users > inside a LAN may then use svnserve and more flexible / > complicated setups are handled by an Apache server on > the same repository. Ah yes. If somebody could "fix" the auth support in svnserve (in a way that really works, as opposed to the current SASL support), that would be great :-). That would open up a lot more options for deployment. > Finally, 1.8 clients are much to slow to do anything useful > with that amount of bandwidth. Checksumming alone limits > the throughput to ~3Gb/s (for export since it only uses MD5) > or even ~1Gb/s (checkout calculates MD5 and SHA1). > > Future client will hopefully do much better here. Indeed. That would make the client again the clear bottleneck :-). Besides, even if you checksum at 3Gb/s, you'll need some seriously fast hardware to write to persistent storage at such a speed :-). -- Johan