Re: [boinc_dev] Fw: Re: Optimizing uploads.....

Lynn W. Taylor Tue, 21 Jul 2009 09:26:35 -0700

I don't really understand the attraction to multiple, distributed upload 
servers.

Taking SETI as our model, you could spot an upload server or two at 
PAIX, or on Campus, and you could take uploads at gigabit rates.

Some sort of daemon on the upload server would then send work back to 
the main site as quickly as possible without saturating the link.

... but you're still limited to 80 or 90 megabytes.

It doesn't solve the problem (too many simultaneous connections) it just 
moves the problem from client <--> upload server to offsite upload 
server <--> onsite upload server.

It introduces a new problem.  When a work unit is reported, it is by 
definition waiting on the upload server.  Having offsite upload servers 
means you have "reported but not yet here."

(As an aside: I'm not sure why the SETI upload servers have to be 
off-line during a backup, unless they're sharing a volume with some 
other process that gets backed up -- a dependency forced by not having 
enough local disk space on the upload server??)

If BOINC says "reported but not ready for validation" that adds a whole 
new "how long will this take?!?!?!?" messages from fickle forum members.

... and it adds another step where valid work can be lost.

It doesn't really solve the problem (too many simultaneous uploads), it 
just moves it.

The more I look at this problem, the more I look to RFC-2821, especially 
the retry strategies used with E-Mail.  As a general rule, offsite, 
intermediate servers are not used.  They're possible, but they often 
cause more trouble than they're worth.

The idea of not retrying every upload when one fails comes right out of 
RFC-2821.

The biggest problem I have as someone who develops SMTP servers and 
operates them is that I can't control the client (at all).  A server out 
there with 20 messages for us can open 20 connections and deliver one 
message per connection.  It's dumb, but it happens.  Many Microsoft 
Exchange servers will retry every minute, which is just DUMB, if it 
didn't go through a minute ago, the odds are good that it won't go 
through now.

Here are two sections that might be interesting, from section 4.5.4.1:

    The sender MUST delay retrying a particular destination after one
    attempt has failed.  In general, the retry interval SHOULD be at
    least 30 minutes; however, more sophisticated and variable strategies
    will be beneficial when the SMTP client can determine the reason for
    non-delivery.

    Retries continue until the message is transmitted or the sender gives
    up; the give-up time generally needs to be at least 4-5 days.  The
    parameters to the retry algorithm MUST be configurable.

    A client SHOULD keep a list of hosts it cannot reach and
    corresponding connection timeouts, rather than just retrying queued
    mail items.

The more I think about this, the more I think that the solution is to 
follow SMTP:

Implement the retry strategy that Dr. Anderson has already checked in.

Change the back-off calculation to something more SMTP-like: what many 
servers do is do the first retry at ten minutes, and simply double the 
interval for each retry, to some upper limit (I've found four hours is 
good, but BOINC might benefit from limiting that to 1/4th of the 
remaining time to the deadline, or 1 day, whichever is smaller).

It'd be nice to be able to tune that dynamically -- through messages 
from the project to the BOINC clients, but there may not be a practical 
way to do that.

-- Lynn

Mark Pottorff wrote:
> So, am I correct with the assumption that if multiple upload servers are 
> used, that they are all presumed to be on the same storage network, or server 
> code would need customization to cross check with all upload servers to 
> respond to the file size queries? And that customization to support an 
> incomplete upload that fails over to another server would be required as well?
> 
> 
> Running Microsoft's "System Idle Process" will never help cure cancer,
>  AIDS nor Alzheimer's. But running rose...@home just might!
> http://boinc.bakerlab.org/rosetta/
> 
> 
> --- On Fri, 7/17/09, Mark Pottorff <[email protected]> wrote:
> 
>> From: Mark Pottorff <[email protected]>
>> Subject: Re: Optimizing uploads.....
>> To: "BOINC dev" <[email protected]>
>> Date: Friday, July 17, 2009, 10:11 AM
>> While everyone seems to now be
>> interested in such things... is there any reason why files
>> that exceed the maximum upload size are pulled in their
>> entirety through the pipe, and sent to null??
>>
>> I mean had this been an ACTUAL DOS... where someone spoofs
>> legitimate looking file info, with 3GB file sizes...
>>
>> See /sched/file_upload_handler.cpp
>> line 339
>> 339            if
>> (!config.ignore_upload_certificates) { 
>> 340             
>>    if (nbytes > file_info.max_nbytes) { 
>> 341               
>>      sprintf(buf, 
>> 342               
>>          "file size (%d KB)
>> exceeds limit (%d KB)", 
>> 343               
>>          (int)(nbytes/1024),
>> (int)(file_info.max_nbytes/1024) 
>> 344               
>>      ); 
>> 345               
>>      copy_socket_to_null(in); 
>> 346               
>>      return return_error(ERR_PERMANENT,
>> buf); 
>> 347             
>>    } 
>>
>> Why read all the data?? Can't the response just be sent (as
>> if the hacker cares about a response), and the socket
>> closed?
>>
>> * * * If any projects are interested in implementing the
>> front-end, off network, buffer uploads (or downloads) tiered
>> sort of scheme, please let me know. * * *
>>
>> In SETI's case, just having a server receiving uploads
>> during the weekly backup window would help keep work flowing
>> more smoothly and avoid a weekly congestion period (perhaps
>> that is done already). It will also keep the non-permenant
>> connection users happy. Whenever they choose go online,
>> there will be an upload server available.
>>
>> In the case where multiple upload URLs exist on tasks, how
>> are file sizes determined accurately? Wouldn't all of the
>> upload servers have to be polled to see if they are the one
>> that actually received the file? Or, I mean, at least poll
>> through the list of servers until the file is found? Perhaps
>> the assumption is that all servers are sharing the same
>> network storage system? That doesn't seem very robust, nor
>> flexible.
>>
>> Same question applies to an interrupted upload. Won't it
>> have to be continued on the same server that it got started
>> with? I've not seen code that appears to support this. Are
>> multiple upload URLs even supported? Or does it always have
>> to be done behind a single URL? Perhaps it is handled on the
>> client side. What if upload to server 1 fails, upload to
>> server 2 gets started and is then interrupted, and then
>> server 2 goes down before the rest of the file is received?
>> How to continue upload or recover from this state?
>>
>>
>> Running Microsoft's "System Idle Process" will never help
>> cure cancer,
>>  AIDS nor Alzheimer's. But running rose...@home just
>> might!
>> http://boinc.bakerlab.org/rosetta/
>>
>>
>>       
>>
> 
> 
>       
> _______________________________________________
> boinc_dev mailing list
> [email protected]
> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
> To unsubscribe, visit the above URL and
> (near bottom of page) enter your email address.
> 
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Re: [boinc_dev] Fw: Re: Optimizing uploads.....

Reply via email to