> That only works if you have the bandwidth to actually get the data to the > validator. The problem at the moment in SETI is the last mile of internet > connection. There are several possible solutions, but having the upload > servers else where does not really help that much. The uploaded data still > has to go through that last mile. > > jm7
I'd forgotten this message when I woke up this morning and posted what I believe to be a possible interim solution, specifically for SETI: http://setiathome.berkeley.edu/forum_thread.php?id=54631#918399 My idea: put a stripped-down upload server at the head-end of the 1GB Hurricane Electric link - co-location at PAIX, Campus or wherever. That server would have the minimum possible BOINC functionality - basically, just the cgi upload handler. It would perform one function only - to handle the million-plus upload connections per day, accept and store the files. Periodically, it would zip the files into an archive, and make ONE file available to SSL - push or pull, your choice. The data gets up the hill, but the million-plus connections (or ten million plus connection attempts) don't. The figures: we reckon that you'd get 10,000 upload files, zipped to about 45 megabytes, every seven or eight minutes. That averages (and please check all these figures) to about 1 megabit/sec: it might even be possible to negotiate with Campus to utilise their network path, and avoid the Hurricane tunnel entirely. I would anticipate: Create a folder for uploaded files Accept data until five minutes/10,000 file limit Create new folder, and rotate incoming files to it. Once all connections to the first folder are complete/timed out, zip it and signal availability On confirmation of transfer and receipt, delete folder and zip Rinse and repeat John raised four issues on the message boards, which I'll summarise (John, feel free to amplify if you think I've misrepresented you). 1) SETI data is nearly incompressible - zipping won't help True, but that applies to the raw data files sent FROM Berkeley TO volunteers. In SETI's case, the return data is small text/XML files, which do compress. But John's point means that my suggestion can't be generalised to all BOINC projects - those which have larger upload files, probably compressed already (like Einstein and CPDN) wouldn't benefit. 2) Even unzipping the archive on receipt requires scarce CPU power Someone at Berkeley will have to do the maths, but I think offloading those million connections to a different server should release some spare CPU cycles. And is unzip a particularly costly process? 3) The zip file still has to get up the hill And could suffer packet loss. But is packet loss/retry more or less costly than connection loss/resend? Maybe Lynn can help with that one. 4) Reports are asynchronous and can occur at any time after the file is uploaded This is the tricky one, and would require one minor BOINC server change. Strictly speaking, it doesn't matter whether reporting is asynchronous or synchronous: the critical path in the current server process is that the file is uploaded before the validator runs. That is enforced by two separate sequential rules: The validator runs after the result is reported (enforced by the server) The result is reported after the file is uploaded (enforced by the client) But if we could relax the critical path, the upload/report sequence becomes asynchronous. And we can relax the critical path simply by saying that the validator outcome "file not present" is transitioned to backoff/retry, instead of immediate failure To summarise - Advantages ------------ Relatively simple server requirements for the 'data concentrator' - just cgi, and some filesystem-level cron scripting Much cheaper than $80,000 for 'fibre up the hill' Quicker to implement than more esoteric suggestions - I don't think there's anything above that's more complicated than the staff regularly achieve in their sleep! Scalable - multiple concentrators could be set up, on different continents if desired. Reversible - just switch the upload DNS to point back to Bruno, and it'll work as before Disadvantages --------------- Another server to buy/scrounge, configure and manage At a remote location Requires a change to Validator logic, plus safeties to scavenge tasks which go into infinite backoff Adds latency to the report/validate cycle, hence an increase in temporary storage/database use Delayed user gratification (well, for half the users, anyway - the half who report second) _______________________________________________ boinc_dev mailing list [email protected] http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address.
