Lynn W. Taylor wrote:
> Martin wrote:
>> I rather like the idea of clients requesting a time slot from the 
>> project server for when they can upload their data and at what rate...

Upon further thought... Although that would be "very nice" and "fun" to 
try, it's also way OTT and overly elaborate for this scenario!


> Under no circumstances do you want to tell the client to slow down.  If 
> you are optimizing (from the server point of view) you want to get them 
> in, get them done, and get them out -- you want them at top speed, or 
> completely silent.
> 
> Telling the client to slow down is a self-DOS.

That depends on whether you are being killed by the link bandwidth being 
swamped, or whether you have excess bandwidth for the data and it is 
instead that you are being killed by the simultaneous connections count 
(and server resource limits being hit).

Matt has tried all manner of server tweaks to try to improve things. His 
comments suggest that the results are usually 'unexpected','random' and 
'confusing'.

That suggests that the bottleneck is NOT on the servers that he is 
tweaking. There is some other controlling effect external to those servers.

I suspect it is all just randomness from dropped data packets on a 
swamped link, followed up by data amplification when the 
uploads/downloads hit for what requests randomly get through.


> We know that individual clients have all kinds of different latencies 
> and connection speeds, but from a purely practical standpoint, if you 
> have 100 machines connected to a project server, chosen at random, 
> they'll average out at some median speed, close enough.

OK, to guess some numbers...

Uploads:

s...@h has a 100Mb/s pinch point on their link;

A typical UK (adsl) uplink is anything from 128kb/s up to 1Mb/s 
depending on the connection. Most I guess will be 512kb/s or 768kb/s. 
Add in a few dial-up 8kb/s, and then hope that there are no academic 
links that can blast out a full 100Mb/s...

That means that for a "guestimate", you don't want more than about 150 
to 200 simultaneous uploads. Note also that just one fast upload can 
skittle the lot into link saturation.


Can the Boinc servers monitor the incoming uploads and simply defer or 
deny new upload requests until the incoming data rate is seen to drop 
below 80% of link capacity.

*That has just got to be a simple fix* !


Note that some control of incoming bandwidth can be achieved by merely 
delaying the server response to a request. So... delay a request until 
some time limit and then issue a deny regardless to kick that requesting 
client into a backoff count?


> clients when the load is just silly" -- like 180,000 clients (s...@home) 
> each with 20 uploads retrying on average every couple of hours.
> 
> The thought I had yesterday would cause that 20 upload machine to skip 
> 19 attempts out of 20 when things got busy -- with no information from 
> the servers at all, except for the failed upload.

Use link traffic management to guarantee the NAKS get out...


> Reservation systems require a database and a whole lot of other 
> infrastructure.

Agreed, kill that one!


> Beyond not retrying every single WU independently, I'm thinking about a 
> way for a project to simply "announce" a couple of parameters for the 
> random-backoff logic.
> 
> Whatever we do, it has to assume that the project servers (all of them) 
> are unreachable -- it has to be "out of band."

That's what the present exponential backoff does (even if the backoff 
parameters need tuning).

However, that scheme only assumes that there is /no/ connection with the 
servers. Having the backoff zeroed on just one successful contact on a 
saturated link means that the link is kept saturated.


My assumption is that the network bottleneck is being DDOSed by a "data 
amplification" effect... The link gets into the state of being saturated 
whereby the small 'request to upload' requests have a good chance of 
getting through, but then the following BIG uploads then disgracefully 
fight it out for an extended period on the swamped link until TCP either 
eventually succeeds or finally gives up after wasting a large chunk of 
bandwidth.


Regards,
Martin

-- 
--------------------
Martin Lomas
m_boincdev ml1 co uk.ddSPAM.dd
--------------------
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Reply via email to