You expressed the issues well.

On Wed, Apr 30, 2014 at 7:48 PM, Jeffrey Haas <[email protected]> wrote:
> On Tue, Apr 29, 2014 at 10:19:14AM -0400, Jamal Hadi Salim wrote:
>> This is back again with node overload.
>> Our experience with ForCES made us prioritize events and request-response
>> differently. This is important only when there is an overload case.
>> As an example if i had sufficient cycles/bandwith/ram space to respond to 
>> either
>> an ADD or an event - I choose to use those resources to process and respond
>> to the ADD; which means events are not reliably delivered to the clients.
>>
>> I think something like this would be needed for I2RS.
>
> In our architecture, we are permitting multiple clients to communicate with
> one agent, so this somewhat compounds the issue.

Given the nature of what I2RS serves, it is a challenge (the problem
domain is simpler
when serving files as an example).

> It does permit some amount
> of discussion of what we can do about a few issues in this problem space:
>
> - Pipelining: If you can submit multiple requests but they must be satisfied
>   in the order submitted (e.g. as per netconf), the amount of work that a
>   given request implies has impact on overall throughput.

I think we should allow pipelining and batching to improve throughput.
pipelining should be window-ed in general and based on the transaction in
flight manageable by the client.
[I will add these to the "protocol requirements" i promised to send].

> - Even if you're able to bypass this to some extent using multiple client
>   sessions, if the resources you're working with rendezvous at a common
>   blocking point (e.g. some RIB service that has internal mutual exclusion
>   semantics), then this can be problematic.  We're not doing locking and
>   thus we're not exploring the semantic of saying "would block".
>

You expressed this better than i did. It was a point i was trying to get across
in my earlier post. Essentially access to the rendezvous point is shared by many
users/clients and the agent is acting as a dispatcher. If the access
is asynchronous,
something along the request or response path is going to overload at some point
and then the typical remedy is to start  dropping. I think most of this is
implementation-resolvable e.g.
If you are using windowed transport (like tcp/sctp) then you can stop
reading from the source (source) that is creating work and there will
be backpressure
build up. But i dont think this resolves the issue if the source is a
shared resource.

> - The work in a reply may be huge.  Consider a single client having in its
>   work queue a "add route" and a "give me the entire BGP RIB".  Ordering
>   will clearly have impacts on how quickly the add route operation
>   completes.

Indeed.
There is also the other direction (client->agent).
e.g a client  doing 500K route updates (say in batches of 1K routes
per msg) vs
one that  is doing a single route update.
This is a fairness challenge which is likely addressable via implementation.

> I suspect that (mutex example aside), operational semantics across more than
> one client session may address many of these issues.  What isn't clear is
> what needs to be addressed in our documents about such issues.

There are no issue to address if there is no overload. So what needs
to be addressed
is "agent overload" I think. Priorities become important to manage in
such a case
(regardless of whether you use multiple or a single session).

cheers,
jamal

> -- Jeff

_______________________________________________
i2rs mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/i2rs

Reply via email to