Re: [alto] Incremental updates transport using SSE?

Xiao SHI Mon, 20 Oct 2014 08:29:24 -0700

Hi Wendy,

#1: Yes, that will solve the problem I was thinking of.


A follow up on this: what if say the client gets a cost map which depends
on a version of the network map that the client has not received? Should
the client simply discard the cost map or its patch? or temporarily save
the cost map and wait? Or does this question enters the realm of
client/server implementation/optimization and is no longer the protocol's
concern?

Anyhow, I agree that the resource-id should do.

#2: Sounds good. The reason I thought of this is that a network map could
be in multiple update streams, should/could there be a preference of which
stream the user get the update of the network map from? Or the client could
simply find *any* update stream which contains the resource update it wants?

#3: That's exactly what I meant. Just double checking.

Thanks,
Xiao


On Mon, Oct 20, 2014 at 10:14 AM, Wendy Roome <[email protected]>
wrote:

> Xiao,
>
> #1: I don't think that is a good idea. The tag is mutable -- it changes
> every update. I believe the event type should be immutable over updates.
> E.g., note that the IRD has resource ids, but not tags. I believe the
> scenario you described is easily avoidable: when the client gets a new
> network map event, the client updates the cached network map, including
> updated the tag, and automatically discards any cached cost map(s) that
> depended on the old tag. And in any case, before using a cost map, the
> client should always verify that the network map tag in the cost map
> matches the tag of the map used to lookup the address.
>
> Or did I not understand your scenario properly?
>
> #2: That seems to be an unnecessary optimization and complication. It's a
> complication because it adds an additional constraint (that uri must exist,
> the resource must be an event stream, etc). It's unnecessary because let’s
> be realistic, an IRD will have at most 20 or 30 entries. Searching them for
> an update stream is not a challenge. Particularly if you remember a client
> does that once and then keeps the stream open for hours.
>
> Also, if you do think that is necessary, the map resource should give the
> *resource id* of the associated update resource, not the uri. Only a
> resource's IRD entry should give the uri. Think of that as equivalent to
> normalizing a relational DB.
>
> Or perhaps there could be a capability (or an attribute) that says this
> resource does have an update stream, so the client knows whether search the
> IRD for it.
>
> #3: I assumed that for a filtered update stream, the client would use a
> POST request to tell the server which events the client wanted, and the
> server would filter out (or just not send) the other events. Is that what
> you meant?
>
> - Wendy Roome
>
> From: Xiao SHI <[email protected]>
> Date: Fri, October 17, 2014 at 20:13
>
> To: Wendy Roome <[email protected]>
> Cc: Xiao SHI <[email protected]>, "Y. Richard Yang" <[email protected]>,
> IETF ALTO <[email protected]>
> Subject: Re: [alto] Incremental updates transport using SSE?
>
> Hi Wendy,
>
> I think this is a very good and efficient design. Just a few comments:
>
> 1. In the event field of the SSE, you included the resource-id but not the
> tag along with it. I agree that the tag seems unnecessary because each
> event is essentially an update of the map, and should be assigned a new tag
> if the update were not incremental. However, consider the following
> scenario: if, for whatever reason, the client only subscribed to the
> network map incr updates (or the server only provided the network map
> stream), and the client is requesting cost maps from the server via regular
> mechanism (i.e. always getting a full new map which contains the
> dependent-vtag), could there be issues when the version of the network map
> is not synchronized with the dependent vtag that the client is about to
> use? For example, if I got the cost map which depends on the network map a
> few incremental updates ago, but I have already processed those updates,
> there could potentially be PID mismatches and what not, right?
>
> Shall we simply go ahead and add the tag into the event field? As for cost
> maps, that would be the dependent-vtag. We can also put it in the id field.
>
> 2. In the IRD, would it make sense to include the incremental update uri
> in the entries for regular resources? For example, a client would probably
> find it useful to know that the server is providing
> "my-default-network-map," and its incr update stream can be found in "
> http://example.com/network-map-updates.";
>
> So the IRD would look like this:
>   "network-map": {
>     ...,
>     "update-uri":"http://somewhere.com/network-map-updates";
>   },
>   "routingcost-map": {
>     ...,
>     "update-uri":"http://somewhere.com/network-map-updates";
>   },
>   "hopcount-map": {
>     ...,
>     "update-uri":"http://somewhere.com/network-map-updates";
>   },
>
>   "network-map-updates": {
>     "uri": "http://somewhere.com/network-map-updates";,
>     "media-type": "text/event-stream",
>     "uses": ["network-map", "routingcost-map", "hopcount-map"],
>     "accepts": "application/alto-updatestreamfilter+json",
>     "capabilities": {
>       "events": [
>         {"media-type": "application/alto-networkmap+json",
>          "resource-id": "network-map"},
>         {"media-type": "application/alto-costmap+json",
>          "resource-id": "routingcost-map"},
>         {"media-type": "application/merge-patch+json",
>          "resource-id": "routingcost-map"},
>         {"media-type": "application/alto-costmap+json",
>          "resource-id": "hopcount-map"},
>         {"media-type": "application/merge-patch+json",
>          "resource-id": "hopcount-map"}
>       ]
>     }
>   }
>
> This saves the trouble for the client to search through the "uses" field
> in each stream resource; Also, when there are multiple streams for
> incremental updates for one map (e.g. one stream for networkmap A + costmap
> B, one stream for networkmap A + costmap C and D), the server could
> load-balance by providing the default update-uri for networkmap A in its
> IRD entry.
>
> 3. To reduce overall traffic, I assume for one particular event stream,
> the event filtering would be done on the server side, right?
>
> What do you think?
>
> Cheers,
> Xiao
>
> On Fri, Oct 17, 2014 at 1:35 PM, Wendy Roome <[email protected]>
> wrote:
>
>> Xiao,
>>
>> My first take was that update stream resources should be 1-1 with the
>> underlying full map resources. Then I realized that cost maps depend on
>> network maps, and clients & servers should coordinate changes to the two.
>> The easiest way to accomplish that was to use the same stream for both
>> network map and cost map updates, with the event type distinguishing
>> network map updates from cost map updates.
>>
>> But your comment reminded me that network maps are 1-n with cost maps.
>> Eg, a network map can have two cost maps, one for routingcost and the other
>> for hopcount.
>>
>> So I suggest that an update stream resource can provide updates for an
>> arbitrary set of resources. The server picks the set. Each update event has
>> a media-type (basically, full-map vs merge-patch), and the resource-id of
>> the map it updates. Because SSE only allows three fields (event, id and
>> data), I suggest we encode the media-type,resource-id pair in the event
>> type, as a JSON object with those field names, as in:
>>
>>     event:
>> {"media-type":"application/alto-network-map+json","resource-id":"network-map"}
>>
>> Here’s a detailed example of an IRD with an update stream service that
>> gives updates for the network map and the associated routingcost and
>> hopcount maps. It offers merge-patch incremental updates for the cost maps,
>> but only full updates for network maps:
>>
>>   "network-map": {...},
>>   "routingcost-map": {...},
>>   "hopcount-map": {...},
>>
>>   "network-map-updates": {
>>     "uri": "http://somewhere.com/network-map-updates";,
>>     "media-type": "text/event-stream",
>>     "uses": ["network-map", "routingcost-map", "hopcount-map"],
>>     "accepts": "application/alto-updatestreamfilter+json",
>>     "capabilities": {
>>       "events": [
>>         {"media-type": "application/alto-networkmap+json",
>>          "resource-id": "network-map"},
>>         {"media-type": "application/alto-costmap+json",
>>          "resource-id": "routingcost-map"},
>>         {"media-type": "application/merge-patch+json",
>>          "resource-id": "routingcost-map"},
>>         {"media-type": "application/alto-costmap+json",
>>          "resource-id": "hopcount-map"},
>>         {"media-type": "application/merge-patch+json",
>>          "resource-id": "hopcount-map"}
>>       ]
>>     }
>>   }
>>
>> This is a POST request. The client sends
>> application/alto-updatestreamfilter+json data (a new MIME type) to select
>> the events it wishes to get. Here is how a client would request updates for
>> the network and routingcost maps, but not the hopcount map:
>>
>>   POST /network-map-updates
>>   Host: somewhere.com
>>   Content-Length: ###
>>   Content-Type: application/alto-updatestreamfilter+json
>>   Accept: test/event-stream,application/alto-error+json
>>
>>   {"events": [
>>      {"media-type": "application/alto-networkmap+json",
>>       "resource-id": "network-map"},
>>      {"media-type": "application/alto-costmap+json",
>>       "resource-id": "routingcost-map"},
>>      {"media-type": "application/merge-patch+json",
>>       "resource-id": "routingcost-map"}
>>   ]}
>>
>> The server would respond with something like this.
>>
>>   HTTP/1.1 200 OK
>>   Content-Type: text/event-stream
>>
>>   : Full network map, sent immediately.
>>   event:
>> {"media-type":"application/alto-network-map+json","resource-id":"network-map"}
>>   data: { ... full network map ... }
>>
>>   : Full routingcost map, sent immediately.
>>   event:
>> {"media-type":"application/alto-costmap+json","resource-id":"routingcost-map"}
>>   data: { ... full routingcost cost map ... }
>>
>>   : Incremental routingcost map update, sent when enough costs have
>> changed.
>>   event:
>> {"media-type":"application/merge-patch+json","resource-id":"routingcost-map"}
>>   data: { ... merge-patch for routingcost cost map ... }
>>
>> Note that this omits the ids. They aren't really necessary. But if the
>> underlying SSE library expects them, the server can always use sequence
>> numbers 1, 2, 3, etc.
>>
>> How do people feel about this approach?
>>
>> - Wendy Roome
>>
>> From: Xiao SHI <[email protected]>
>> Date: Fri, October 17, 2014 at 10:10
>> To: Wendy Roome <[email protected]>
>> Cc: Xiao SHI <[email protected]>, "Y. Richard Yang" <[email protected]>,
>> IETF ALTO <[email protected]>
>>
>> Subject: Re: [alto] Incremental updates transport using SSE?
>>
>> Hi Wendy,
>>
>> I completely agree with you. I was thinking maybe there's some clever way
>> of using the SSE event-id, but the questionable saving is simply not worth
>> the logging and synchronization trouble.
>>
>> 1. One thing that follows this choice is to use one single SSE stream
>> (open connection) for each resource. I briefly considered having one open
>> connection and send updates for all resources on the server via that
>> stream. The advantage of having only one connection open is smaller
>> overhead, and the disadvantage is that it would require a more complicated
>> ID and version tagging in event name/id maybe. What do you think?
>>
>> 2. If the mapping between SSE stream and underlying map is one-to-one,
>> would it make sense for the resourceID of a stream to be the resourceID of
>> the underlying map and ".update" concatenated together? This way, when the
>> client actually requests and initiates the stream, the client would be
>> aware of the resourceID of the underlying map.
>>
>> 3. Following that, we should define id field in the events. Richard
>> proposed using a tag as the id if it's a full network map, and
>> newtag_oldtag if it's a patch. The vtag for network map is ResourceID+tag
>> tuple, if the client knows about the resourceID, could we only use the tag?
>> And should the cost maps simply use dependent-vtags? Since the id field is
>> a string, what format do we want the tags in? A few options are 1) json
>> with all whitespaces removed; 2) an even more compact patterned string.
>>
>> e.g. for network map patch, it could be:
>>
>> {"old-tag":{"resource-id":"my-default-network-map","tag":"3ee2cb7e8d63d9fab71b9b34cbf764436315542e"},"new-tag":{"resource-id":"my-default-network-map","tag":"
>> c0ce023b8678a7b9ec00324673b98e54656d1f6d"}}
>>
>> or (since it must be the same resource):
>>
>> {"resource-id":"my-default-network-map","old-tag":"3ee2cb7e8d63d9fab71b9b34cbf764436315542e","new-tag":"
>> c0ce023b8678a7b9ec00324673b98e54656d1f6d"}
>>
>> or (if the client knows the resource-id for the underlying map of the
>> stream, just use a delimiter):
>> 3ee2cb7e8d63d9fab71b9b34cbf764436315542e_
>> c0ce023b8678a7b9ec00324673b98e54656d1f6d
>>
>> What do you think?
>>
>> Best,
>> Xiao
>>
>> On Fri, Oct 17, 2014 at 9:21 AM, Wendy Roome <[email protected]>
>> wrote:
>>
>>> Xiao,
>>>
>>> That is a very good point. I hoped someone would ask about that.
>>>
>>> It is a tradeoff. The disadvantage of requiring the server to send full
>>> map(s) at the start of each stream is that if the connection drops, we have
>>> to send the full map over again. But the advantage is that incremental
>>> updates are only within the context of this particular stream. We don't
>>> need keep track of where we were.
>>>
>>> To see the issue, suppose we wanted to avoid re-sending the full map
>>> when the client reestablishes the connection. Then the server must assign a
>>> unique id to every incremental update event. When the client reconnects,
>>> the client gives the server the last known event id, and the server sends
>>> any events the client missed.
>>>
>>> BTW, SSE does define event ids. Each event has an id, and when
>>> connecting, the client gives a Last-Event-Id header with the last id it
>>> received (if any).
>>>
>>> So what is the problem with that? First, the server MUST assign a unique
>>> id to every incremental update, and they must be the same for all clients.
>>> That is, if the server has several increment-update clients, the server
>>> must "clock" their update event streams at the same rate. Second, the
>>> server must keep a buffer of old update events, in case a client needs to
>>> reconnect. And third, because a reconnecting client may send an old (or
>>> bogus) Last-Event-Id, the server may have send the full map(s) anyway.
>>>
>>> If we do not use event ids, then it is much simpler. Incremental updates
>>> are in the context of this stream. Period. No persistent state that must
>>> survive the end of the connection. And if desired, the server can send
>>> updates are different rates to different clients.
>>>
>>> So I think it boils down to how often we expect connections to drop
>>> unintentionally. If that is common, then, yes, we need to define event ids
>>> to avoid resending the full maps. But if we can assume accidental drops are
>>> rare, then I think it is simpler just to send the full map(s) for every new
>>> connection.
>>>
>>> - Wendy Roome
>>>
>>> From: Xiao SHI <[email protected]>
>>> Date: Thu, October 16, 2014 at 18:22
>>> To: Wendy Roome <[email protected]>
>>> Cc: "Y. Richard Yang" <[email protected]>, IETF ALTO <[email protected]>
>>> Subject: Re: [alto] Incremental updates transport using SSE?
>>>
>>> Hi Wendy,
>>>
>>> If a client accidentally dropped the connection and hopes to re-connect
>>> and receive the merge-patch events, would it make sense for the server not
>>> to send the first full-map event? It will save quite some effort that way.
>>>
>>> Best,
>>> Xiao
>>>
>>>
>>
>

_______________________________________________
alto mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/alto

Re: [alto] Incremental updates transport using SSE?

Reply via email to