Re: [freenet-dev] New proposal for QRing and Load Balancing.

Tom Kaitchuck Tue, 11 Nov 2003 19:09:48 -0800

On Monday 10 November 2003 09:32 pm, Ken Corson wrote:
> Tom Kaitchuck wrote:
> > requests out of both ends of the queue. One end is a LIFO. This would be
> > the default end. When a new thread becomes available, it grabs a request
> > from this end and removes it from the queue, and sends a query accept. On
> > the
>
> do we actually send an explicit QA message today ? If not, we would need
> to for this proposal. Nothing difficult here.


Yeah, we do.

> > other end (the FIFO end) we should check if we have previously seen that
> > key before. If we have, send an accept and process the request. If not,
> > mark it as such, and leave it in the queue.
> >
> > Then to prevent the queue from getting to large there should be some
> > assumed timeout, where the requesting node just assumes it was dropped if
> > it has been more than a given amount of time without an accept. If that
> > much time has gone by and the request is still in the queue, then drop
> > it, but don't bother to send any message back.
>
> My only discomfort with this approach is the rough approximation of
> "time" between nodes. Assume that there is always a 5 second delay between
> a node sending a message and the other node receiving it. Of course, it
> will probably average much lower than that, but do not discount the effect
> of this delay. If requestor times out in 15 seconds, and receiver measures
> 14 seconds from receive time (he cannot use a timestamp provided by
> requestor - their respective clocks are NOT sync'ed closely enough)
> before he gets to it, he may send a response that will arrive after
> the requestor has timed it out. The bigger the timeout period, the less
> likely this is to happen, but it should always remain a concern. This
> is wasteful, and could cause some problems that are really hard to
> diagnose and debug.

You forget that we are working over TCP. So we know when they get the message. 
True, we don't know when they process it, but there shouldn't be anyway for 
the requestor's relative clock to get ahead, provided that we allow time for 
the ACK to properly arrive. (And we'll know if it doesn't get there)

>   I'm not sure that I like reordering (thus prioritizing) the requests.
> Suppose they are all for valid, findable content. The newest requests
> would always get serviced first, at the expense of earlier requests.
> This could lead to bad routing time measurements by the requestor,
> thus confusing NGR.

Not really, QR's will still come in fast. Your message being dropped totally, 
is different. Just use two estimators.

> > Now to prevent any single node, overloading you at the expense of others,
> > we should reject incoming requests with the probability of
> > (FractionOfCurrentlyPendingRequestsFromThisNode * Current%OfOverload).
> > This
>
> I read this as FractionOf*TOTAL*CurrentlyPendingRequests,FromThisNode .
> This does not take into account the various bandwidth capacities for each
> remote node, but it is still a _significant_ improvement over current.

Yes.
> > will insure that a node that is using more of the resources will be more
> > likely to be rejected. Also so that other nodes know how likely they are
> > to be rejected, it would be usefull if a query reject contained these two
> > numbers.
>
> I'm sorry, which two numbers ?

FractionOfCurrentlyPendingRequestsFromThisNode and Current%OfOverload

> > However the problem remains that regardless of the amount of resources
> > that any node uses they would get the same performance. To solve this,
> > instead of simply grabbing each request from the queue, on the LIFO end,
> > each query should have a probability of being skipped over with a
> > probability of FractionOfCurrentlyPendingRequestsFromThisNode. This way
> > other nodes get more
>
> I strongly favor this consideration, of "dropping every n-th request,"
> which is the basic behavior that would result.

We don't necessarily want to drop it totally, just let other requests in ahead 
of it. 

> > This solves the problem such that, even under load the average successful
> > time is fast. If we are overloaded, we don't necessarily waste bandwidth
> > to tell
>
> in some cases (request), faster than others. This may discolor the
> estimates in the requestor's NGR profile of 'us,' the local (requestee)
> node.

See my previous responce.

> Tom - I would greatly favor a FIFO queue over the split LIFO/FIFO queue.
> It is more straightforward. I may have identified a problem with LIFO
> above.

It's not so easy to make all the things I originally proposed work with a pure 
FIFO. 

This doesn't really show many of the advantages to my proposal. But here is a 
quick compairson that took entirely too much time.

Here is a network situation that is very hard to handle under any 
circumstances. Number of incoming data requests in the time it takes us to 
process 1:
1 2 3 4 3 2 1 1 1 1 1 0 1 1 0 1
Total = 23

In the accept all solution:

Accept: 1       Processing:  1
Accept: 2       Processing:  2
Accept: 3       Processing:  5
Accept: 4       Processing:  9
Accept: 3       Processing:  12
Accept: 2       Processing:  14
Accept: 1       Processing:  15
Accept: 1       Processing:  13
Accept: 1       Processing:  13
Accept: 1       Processing:  13
Accept: 1       Processing:  13
Accept: 0       Processing:  13
...

In the process oldest solution:

Accept: 1       QueueSize:  0   TimeForCompleation:  1
Accept: 2       QueueSize:  1   TimeForCompleation:  1
Accept: 3       QueueSize:  3   TimeForCompleation:  2
Accept: 4       QueueSize:  6   TimeForCompleation:  2
Accept: 3       QueueSize:  8   TimeForCompleation:  3
Accept: 2       QueueSize:  9   TimeForCompleation:  3
Accept: 1       QueueSize:  9   TimeForCompleation:  4
Accept: 1       QueueSize:  9   TimeForCompleation:  5
Accept: 1       QueueSize:  9   TimeForCompleation:  6
Accept: 1       QueueSize:  9   TimeForCompleation:  6
Accept: 1       QueueSize:  9   TimeForCompleation:  7
Accept: 0       QueueSize:  8   TimeForCompleation:  7
...

In the accept newest system:

Accept: 1       QueueSize:  0   TimeForCompleation:  1
Accept: 2       QueueSize:  1   TimeForCompleation:  1
Accept: 3       QueueSize:  3   TimeForCompleation:  1
Accept: 4       QueueSize:  6   TimeForCompleation:  1
Accept: 3       QueueSize:  8   TimeForCompleation:  1
Accept: 2       QueueSize:  9   TimeForCompleation:  1
Accept: 1       QueueSize:  9   TimeForCompleation:  1
Accept: 1       QueueSize:  9   TimeForCompleation:  1
Accept: 1       QueueSize:  9   TimeForCompleation:  1
Accept: 1       QueueSize:  9   TimeForCompleation:  1
Accept: 1       QueueSize:  9   TimeForCompleation:  1
Accept: 0       QueueSize:  8   TimeForCompleation:  7
Accept: 1       QueueSize:  8   TimeForCompleation:  1
Accept: 1       QueueSize:  8   TimeForCompleation:  1
Accept: 0       QueueSize:  7   TimeForCompleation:  11
...

All of these first three systems have the same problem. They are incapable of 
dealing with a growing number of requests or a large queue size.

Real possibilities:

Fixed queue size, process oldest solution:

Accept: 1       QueueSize:  0   TimeForCompleation:  1
Accept: 2       QueueSize:  1   TimeForCompleation:  1
Accept: 3       QueueSize:  3   TimeForCompleation:  2
Accept: 0       QueueSize:  2   TimeForCompleation:  2
Accept: 2       QueueSize:  3   TimeForCompleation:  3
Accept: 0       QueueSize:  2   TimeForCompleation:  4
Accept: 1       QueueSize:  2   TimeForCompleation:  3
Accept: 1       QueueSize:  2   TimeForCompleation:  3
Accept: 1       QueueSize:  2   TimeForCompleation:  3
Accept: 1       QueueSize:  2   TimeForCompleation:  3
Accept: 1       QueueSize:  2   TimeForCompleation:  3
Accept: 0       QueueSize:  1   TimeForCompleation:  3
Accept: 1       QueueSize:  1   TimeForCompleation:  3
Accept: 1       QueueSize:  1   TimeForCompleation:  2
Accept: 0       QueueSize:  0   TimeForCompleation:  2
Accept: 1       QueueSize:  0   TimeForCompleation:  1

16 stages.      16 items processed. average delay = 39/16 = 2.4375
                7 rejects.      rejecttime = 1.
NGestimates
overall 16/23 = 69.57% success. 
.6957*2.4375 + .3043 * ( 1 + Global )

Process newest while dropping data that is more than 3 turns old:

Accept: 1       QueueSize:  0   TimeForCompleation:  1  #Dropped: 0
Accept: 2       QueueSize:  1   TimeForCompleation:  1  #Dropped: 0
Accept: 3       QueueSize:  3   TimeForCompleation:  1  #Dropped: 0
Accept: 4       QueueSize:  5   TimeForCompleation:  1  #Dropped: 1
Accept: 3       QueueSize:  5   TimeForCompleation:  1  #Dropped: 2
Accept: 2       QueueSize:  3   TimeForCompleation:  1  #Dropped: 3
Accept: 1       QueueSize:  1   TimeForCompleation:  1  #Dropped: 2
Accept: 1       QueueSize:  0   TimeForCompleation:  1  #Dropped: 1
Accept: 1       QueueSize:  0   TimeForCompleation:  1  #Dropped: 0
Accept: 1       QueueSize:  0   TimeForCompleation:  1  #Dropped: 0
Accept: 1       QueueSize:  0   TimeForCompleation:  1  #Dropped: 0
Accept: 0       QueueSize:  0   TimeForCompleation: IDLE#Dropped: 0
Accept: 1       QueueSize:  0   TimeForCompleation:  1  #Dropped: 0
Accept: 1       QueueSize:  0   TimeForCompleation:  1  #Dropped: 0
Accept: 0       QueueSize:  0   TimeForCompleation: IDLE#Dropped: 0
Accept: 1       QueueSize:  0   TimeForCompleation:  1  #Dropped: 0

16 stages.      14 items processed. average delay = 14/14 = 1
                0 rejects. rejecttime = 1.      9 dropped. DropTime = 3.
NGestimates
overall 14/23 = 60.87% success.
.6087 * 1 + .3913 ( 3 +  Global )

In the process newest system while rejecting above a fixed queue size.

Accept: 1       QueueSize:  0   TimeForCompleation:  1
Accept: 2       QueueSize:  1   TimeForCompleation:  1
Accept: 3       QueueSize:  3   TimeForCompleation:  1
Accept: 0       QueueSize:  2   TimeForCompleation:  2
Accept: 2       QueueSize:  3   TimeForCompleation:  1
Accept: 0       QueueSize:  2   TimeForCompleation:  2
Accept: 1       QueueSize:  2   TimeForCompleation:  1
Accept: 1       QueueSize:  2   TimeForCompleation:  1
Accept: 1       QueueSize:  2   TimeForCompleation:  1
Accept: 1       QueueSize:  2   TimeForCompleation:  1
Accept: 1       QueueSize:  2   TimeForCompleation:  1
Accept: 0       QueueSize:  1   TimeForCompleation:  10
Accept: 1       QueueSize:  1   TimeForCompleation:  1
Accept: 1       QueueSize:  1   TimeForCompleation:  1
Accept: 0       QueueSize:  0   TimeForCompleation:  14
Accept: 1       QueueSize:  0   TimeForCompleation:  1

16 stages.      16 items processed. average delay = 40/16 = 2.5
                7 rejects. rejecttime = 1.      0 dropped.
NGestimates
overall 16/23 = 69.57% success.
.6957*2.5 + .3043 * ( 1 + Global )

Process oldest, proportional reject, max queue size:

Accept: 1       QueueSize:  0   TimeForCompleation:  1
Accept: 2       QueueSize:  1   TimeForCompleation:  1
Accept: 2       QueueSize:  2   TimeForCompleation:  2
(3/2)                   
Accept: 1       QueueSize:  2   TimeForCompleation:  2
(4/3)   
Accept: 1       QueueSize:  2   TimeForCompleation:  3
(3/4)   
Accept: 1       QueueSize:  2   TimeForCompleation:  3
(2/3)   
Accept: 1       QueueSize:  2   TimeForCompleation:  3
Accept: 1       QueueSize:  2   TimeForCompleation:  3
Accept: 1       QueueSize:  2   TimeForCompleation:  3
Accept: 1       QueueSize:  2   TimeForCompleation:  3
Accept: 1       QueueSize:  2   TimeForCompleation:  3
Accept: 0       QueueSize:  1   TimeForCompleation:  3
Accept: 1       QueueSize:  1   TimeForCompleation:  3
Accept: 1       QueueSize:  1   TimeForCompleation:  2
Accept: 0       QueueSize:  0   TimeForCompleation:  2
Accept: 1       QueueSize:  0   TimeForCompleation:  1

16 stages.      16 items processed. average delay = 38/16 = 2.375
                7 rejects. rejecttime = 1.      0 dropped.

NGestimates
overall 16/23 = 69.57% success.
.6957*2.375 + .3043 * ( 1 + Global )

My Proposal, Process newest, proportional reject, reject above fixed queue 
size, process oldest if it is findable or else drop it.

Accept: 1       QueueSize:  0   TimeForCompleation:  1  #Dropped: 0
Accept: 2       QueueSize:  1   TimeForCompleation:  1  #Dropped: 0
Accept: 2       QueueSize:  2   TimeForCompleation:  1  #Dropped: 0
(3/2)                   
Accept: 1       QueueSize:  2   TimeForCompleation:  3  #Dropped: 0
(4/3)           (Queue time maxed, so oldest checked and processed)
Accept: 1       QueueSize:  1   TimeForCompleation:  1  #Dropped: 1
(3/4)           (Queue time maxed, oldest checked and dropped)
Accept: 1       QueueSize:  1   TimeForCompleation:  3  #Dropped: 0
(2/3)           (Queue time maxed, so oldest checked and processed)
Accept: 1       QueueSize:  1   TimeForCompleation:  1  #Dropped: 0
Accept: 1       QueueSize:  1   TimeForCompleation:  3  #Dropped: 0
                (Queue time maxed, oldest processed)
Accept: 1       QueueSize:  1   TimeForCompleation:  1  #Dropped: 0
Accept: 1       QueueSize:  1   TimeForCompleation:  3  #Dropped: 0
                (Queue time maxed, oldest processed)
Accept: 1       QueueSize:  1   TimeForCompleation:  1  #Dropped: 0
Accept: 0       QueueSize:  0   TimeForCompleation:  3  #Dropped: 0
Accept: 1       QueueSize:  0   TimeForCompleation:  1  #Dropped: 0
Accept: 1       QueueSize:  0   TimeForCompleation:  1  #Dropped: 0
Accept: 0       QueueSize:  0   TimeForCompleation: IDLE#Dropped: 0
Accept: 1       QueueSize:  0   TimeForCompleation:  1  #Dropped: 0

16 stages.      15 items processed. average delay = 25/15 = 1.667
                7 rejects. rejecttime = 1.      1 dropped. DropTime = 3.
NGestimates
overall 15/23 = 65.22% success.
.6522 * 1.66667 + .3043 ( 1 + Global ) + .0435 ( 3 + Global )


Obviously the winning solution here is proportional rejection. However 
implementing a split queue as I have suggested, not only allows it to rival 
the performance of anything else, but also have all the other benefits I 
mentioned in my original message.

_______________________________________________
Devl mailing list
[EMAIL PROTECTED]
http://dodo.freenetproject.org/cgi-bin/mailman/listinfo/devl

Re: [freenet-dev] New proposal for QRing and Load Balancing.

Reply via email to