Re: [boinc_dev] Optimizing uploads.....

Lynn W. Taylor Thu, 16 Jul 2009 09:52:50 -0700

... and I found this post to be very interesting:

<http://setiathome.berkeley.edu/forum_thread.php?id=54633&nowrap=true#918297>


It suggests that a big part of the problem is just too many connections, 
and making the pending connection queue small (making excessive 
connections go away faster) made a clear difference.

If that's what actually happened, at that point the problem wasn't 
bandwidth as much as it is TCP handles in the server -- maintaining, 
servicing, etc.

In fact, more bandwidth could simply make more failed connections possible.

Part of what I've been saying is that you don't need to solve the "not 
enough bandwidth" problem or the "not enough handles" problem as unique 
problems.

It can be solved as a generic "not enough" problem.

Slow the clients down to the point where you can service the 
instantaneous load and the throughput jumps.

-- Lynn

Richard Haselgrove wrote:
>> That only works if you have the bandwidth to actually get the data to the
>> validator.  The problem at the moment in SETI is the last mile of internet
>> connection.  There are several possible solutions, but having the upload
>> servers else where does not really help that much.  The uploaded data still
>> has to go through that last mile.
>>
>> jm7
> 
> I'd forgotten this message when I woke up this morning and posted what I 
> believe to be a possible interim solution, specifically for SETI:
> 
> http://setiathome.berkeley.edu/forum_thread.php?id=54631#918399
> 
> My idea: put a stripped-down upload server at the head-end of the 1GB 
> Hurricane Electric link - co-location at PAIX, Campus or wherever. That 
> server would have the minimum possible BOINC functionality - basically, just 
> the cgi upload handler. It would perform one function only - to handle the 
> million-plus upload connections per day, accept and store the files. 
> Periodically, it would zip the files into an archive, and make ONE file 
> available to SSL - push or pull, your choice. The data gets up the hill, but 
> the million-plus connections (or ten million plus connection attempts) don't.
> 
> The figures: we reckon that you'd get 10,000 upload files, zipped to about 45 
> megabytes, every seven or eight minutes. That averages (and please check all 
> these figures) to about 1 megabit/sec: it might even be possible to negotiate 
> with Campus to utilise their network path, and avoid the Hurricane tunnel 
> entirely.
> 
> I would anticipate:
> Create a folder for uploaded files
> Accept data until five minutes/10,000 file limit
> Create new folder, and rotate incoming files to it.
> Once all connections to the first folder are complete/timed out, zip it and 
> signal availability
> On confirmation of transfer and receipt, delete folder and zip
> Rinse and repeat
> 
> John raised four issues on the message boards, which I'll summarise (John, 
> feel free to amplify if you think I've misrepresented you).
> 
> 1) SETI data is nearly incompressible - zipping won't help
> True, but that applies to the raw data files sent FROM Berkeley TO 
> volunteers. In SETI's case, the return data is small text/XML files, which do 
> compress. But John's point means that my suggestion can't be generalised to 
> all BOINC projects - those which have larger upload files, probably 
> compressed already (like Einstein and CPDN) wouldn't benefit.
> 
> 2) Even unzipping the archive on receipt requires scarce CPU power
> Someone at Berkeley will have to do the maths, but I think offloading those 
> million connections to a different server should release some spare CPU 
> cycles. And is unzip a particularly costly process?
> 
> 3) The zip file still has to get up the hill
> And could suffer packet loss. But is packet loss/retry more or less costly 
> than connection loss/resend? Maybe Lynn can help with that one.
> 
> 4) Reports are asynchronous and can occur at any time after the file is 
> uploaded
> This is the tricky one, and would require one minor BOINC server change.
> 
> Strictly speaking, it doesn't matter whether reporting is asynchronous or 
> synchronous: the critical path in the current server process is that the file 
> is uploaded before the validator runs. That is enforced by two separate 
> sequential rules:
> 
> The validator runs after the result is reported (enforced by the server)
> The result is reported after the file is uploaded (enforced by the client)
> 
> But if we could relax the critical path, the upload/report sequence becomes 
> asynchronous. And we can relax the critical path simply by saying that the 
> validator outcome "file not present" is transitioned to backoff/retry, 
> instead of immediate failure
> 
> To summarise -
> 
> Advantages
> ------------
> Relatively simple server requirements for the 'data concentrator' - just cgi, 
> and some filesystem-level cron scripting
> Much cheaper than $80,000 for 'fibre up the hill'
> Quicker to implement than more esoteric suggestions - I don't think there's 
> anything above that's more complicated than the staff regularly achieve in 
> their sleep!
> Scalable - multiple concentrators could be set up, on different continents if 
> desired.
> Reversible - just switch the upload DNS to point back to Bruno, and it'll 
> work as before
> 
> Disadvantages
> ---------------
> Another server to buy/scrounge, configure and manage
> At a remote location
> Requires a change to Validator logic, plus safeties to scavenge tasks which 
> go into infinite backoff
> Adds latency to the report/validate cycle, hence an increase in temporary 
> storage/database use
> Delayed user gratification (well, for half the users, anyway - the half who 
> report second)
> _______________________________________________
> boinc_dev mailing list
> [email protected]
> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
> To unsubscribe, visit the above URL and
> (near bottom of page) enter your email address.
> 
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Re: [boinc_dev] Optimizing uploads.....

Reply via email to