--On Monday, October 05, 2009 04:53:48 PM -0700 Russ Allbery <[email protected]> wrote:

Jeffrey and I were talking some more about this and there is another
decision point: whether to re-randomize the server list by weight for
every call or to only do that when the TTLs expire.  Currently, we only do
that when the TTL expires, but the DNS SRV RFC prefers doing it for every
call (although allows us to specify otherwise).  My inclination is to
allow either for AFS, at least for the time being.  Per call is probably
better in some sense, but I don't think it's sufficiently better to
require implementations do it.

The ordered server list is an implementation detail, and while some discussion of it is appropriate, I don't think we need to specify an exact algorithm. In fact, I think doing so will get pretty hairy, since it interacts with keeping track of down servers.

It's true that the spec gives us some leeway in specifying how to use the weight. In particular, it specifies an algorithm to be used in ordering target hosts having the same priority, but is silent on the question of whether that ordering is to be recomputed for each transaction/whatever. It also makes the assumption that clients will contact each host _in order_, without prior information about which hosts are up, while in practice AFS clients often have considerable information about which servers are up.

I'd suggest it is probably appropriate for AFS clients to use the weighting algorithm described in RFC2782, but omit those servers which are known to be down (by whatever mechanism is used for that). The random order should be reevaluated whenever the SRV data is refreshed, or if a server's up/down state changes. Ideally, a client would randomly select a new server for each call, but performance considerations may dictate doing so less often.

I think we should do the following:

- REQUIRE that priorities be obeyed; a server with lower priority MUST
 be tried before any servers of higher priority, unless the former is
 known to be down.

- REQUIRE that clients use the weighting algorithm described in RFC2782
 to select among servers of equal priority.  However, this algorithm
 may be applied in any of three ways:
 (a) compute a complete randomly-ordered list of servers, then use
     that list to determine a server preference order, such that
     a server appearing earlier in the list will always be tried
     before any server appearing later in the list, unless the former
     is known to be down.
 (b) randomly select a single server each time a call is to be made
 (c) randomly select a single server on a periodic basis, with all
     calls made to the most-recently selected server unless that
     server goes down, in which case a new server is selected.

- If method (a) is used, the client MAY omit known-down servers from
 the list.  If it does, then the client MUST employ some mechanism
 for discovering recovery of a down server, and MUST recompute the
 server list when the up/down state of a server changes.

- If method (a) is used and the client does not omit known-down servers
 from the list, then it SHOULD employ some mechanism for tracking
 which servers are down and discovering recovery of a down server,
 in order to avoid repeated calls to a down server.  But maybe we
 don't need to say this, since failing to do so just makes that
 client's performance sad.

- If methods (b) or (c) are used, the client MUST omit known-down
 servers from the list, and employ some mechanism for discovering
 recovery of a down server, and MUST recompute the server list when
 the up/down state of a server changes.  This is just common sense;
 if you don't do this then random-selection after a failed call may
 just select the same server again.


One way to implement (a) without omitting down servers in current OpenAFS is to compute server preferences based on priority and weight, in a fashion similar to that described in the current draft. Then the CM tries servers in order, but tracks down servers and doesn't make real calls to them.

Note that "recomputing the server list" when a server goes up and down doesn't have to include re-querying the SRV record, and in fact could simply mean keeping track of the current sum-of-weights and adjusting it whenever a server goes up or down. The running sum described in RFC2782 can be computed on the fly as an entry is selected, provided the total is known.


-- Jeff

_______________________________________________
AFS3-standardization mailing list
[email protected]
http://michigan-openafs-lists.central.org/mailman/listinfo/afs3-standardization

Reply via email to