OK thanks. We will work around it in the application layer. On Mon, Dec 5, 2016 at 5:54 PM Qi Zhao <[email protected]> wrote:
> On Mon, Dec 5, 2016 at 5:35 PM, Arya Asemanfar < > [email protected]> wrote: > > Since a TCP load balancer is only aware of TCP packets and not HTTP2 > frames, it cannot multiplex requests from multiple clients onto 1 > connection. A TCP load balancer makes a load balancing decision at > connection establishment, not per stream, request, or packet. > > Re calling Close after a timer fires, this terminates in-flight requests > so we'd need duplicate the book keeping of outstanding streams, which makes > this cumbersome. > > I do not think you have a hard requirement on when to reset the > connection. Thus your code definitely can coordinate to reset the > connection when there is no pending rpcs. > > Actually I think the clients has no idea what is going on to the server > (e.g., restarting) so that any client triggered solutions are either > problematic or wrong -- for example, a server restarts every 24 hours on > average and clients set a 1 hr for connection reset, which will bring 23 > times unnecessary reset overhead. I think the right solution is that the > TCP proxy should detect the server is gone and then tear down corresponding > connections of the clients. Then grpc clients will try to reconnect so that > everything gets balanced. > > > On Mon, Dec 5, 2016 at 4:35 PM, Qi Zhao <[email protected]> wrote: > > I am not clear how the TCP load balancer works. Is it a TCP proxy which > forwards the traffic from the client? If yes, your description is still > confusing to me because there should be 100 connections from the clients to > the TCP proxy and 10 connections from the TCP proxy to the servers. Then I > am confused about the uneven distribution you described. Please do not > assume any domain knowledge and just make a synthetic example. :) > > Actually I think you can simply Dial a ClientConn and Close it when some > timer fires in order to mimic MaxConnectionLifetime you proposed regardless. > > On Mon, Dec 5, 2016 at 3:07 PM, Arya Asemanfar < > [email protected]> wrote: > > Scenario A, and when I said "restarting server" I mean grpc server. > > On Mon, Dec 5, 2016 at 2:59 PM Qi Zhao <[email protected]> wrote: > > I am still confused. The scenario you want to serve as an example is > a) grpc clients --> TCP LB --> grpc servers; > or > b) grpc clients --> grpc servers? > > On Mon, Dec 5, 2016 at 2:49 PM, Arya Asemanfar < > [email protected]> wrote: > > Sorry, I meant grpc server. Yes you are right if the TCP load balancer > restarts there is no problem, so my scenario only applies if the grpc > server restarts. > > On Mon, Dec 5, 2016 at 2:17 PM Qi Zhao <[email protected]> wrote: > > On Mon, Dec 5, 2016 at 12:13 PM, Arya Asemanfar < > [email protected]> wrote: > > Thanks for the feedback. Good idea re metadata for getting the Balancer to > treat the connections as different. Will take a look at that. > > Some clarifications/questions inline: > > On Mon, Dec 5, 2016 at 11:11 AM, 'Qi Zhao' via grpc.io < > [email protected]> wrote: > > Thanks for the info. My comments are inline. > > On Sun, Dec 4, 2016 at 7:27 PM, Arya Asemanfar < > [email protected]> wrote: > > Hey all, > > We're considering implementing some patches to the golang grpc > implementation. These are things we think would better fit inside of grpc > rather than trying to achieve from outside. Before we go through the > effort, we'd like to gauge whether these features would be welcome > (assuming we'll work with owners to get a quality implementation). Some of > these ideas are not fully fleshed out or may not be the best solution to > the problem they aim to solve. I also try to state the problem, so if you > have ideas on better ways to address these problems, please share :) > > *Add DialOption MaxConnectionLifetime* > Currently, once a connection is established, it lives until there is a > transport error or the client proactively closes the connection. These > long-lived connections are problematic when using a TCP load balancer, such > as the one provided by Google Container Engine and Google Compute Engine. > At a a clean start, clients will be somewhat distributed among the servers > behind the load balancer, but if the servers go through a rolling restart > server will become unbalanced as clients will have a higher likelihood of > being connected to the first server that restarts, with the most recently > restarted server having close to zero clients. > > I do not think long-lived connections are problematic as long as there are > live traffic on them. We do have plan to add idle shutdown to actively > close the TCP connections which live long and have no traffic for a while. > Which server to chose is really depending on the load balancing policy you > choose -- I do not see why your description could happen if you use a > round-robin load balance policy. > > > We have a single IP address that we give to GRPC (since the IP address is > Google Cloud's TCP load balancer). The client establishes one connection > and has no reason to disconnect in normal conditions. > > Here's an example scenario that results in uneven load: > - 100 clients connected evenly to 10 servers > - each of the 10 servers has about 10 connect > - each of the clients sends about an equal amount of traffic to the server > they are connected to > - one of the servers restarts > - the 10 clients that were connected to that 1 server re-establish > connections > - the new server, assuming it came up in time, has on average 1 > connection, with each of the other 9 having 1 additional connection > - now we have 10 servers, one with 1 client and 9 with 11 clients so the > load is unevenly distributed > > What "server" do you mean here? My understanding is that all these 100 > clients connect to the TCP load balancer. > > > Is there another workaround for this problem other than adding another > intermediate load balancer? Even then, the load to the load balancers would > be uneven assuming we'd still need a TCP level LB given we're using > Kuberentes in GKE. > > > > We propose fixing this by adding a MaxConnectionLifetime, which will force > clients to disconnect after some period of time. We'll use the same > mechanism as when an address is removed from a balancer (e.g. drain the > connection, rather than abruptly throw errors). > > This should be achieved by GRPCLB load balancer which can sense all the > work load of the servers and send refreshed backend list when needed. I am > not convinced MaxConnectionLifetime is a must. > > > *Add DialOption NumConnectionsPerSever* > This is related to the problem above. When a client is provided with a > single address that points to a TCP load balancer, it's sometimes > beneficial to have the client have multiple connections since they > underlying performance might vary. > > I am not clear what you plan to do here. Do you want to create multiple > connections to a single endpoint (e.g., TCP load balancer)? If yes, you can > customize your load balancer impl to do that already (the endpoints with > same address but different metadata are treated as different ones in grpc > internals). > > > Will try this out. Thanks for the suggestion. > > > > *Add ServerOption MaxConcurrentGlobalStreams* > Currently there is only a way to limit the number of streams per client, > but it'd be useful to do this globally. This could be achieved via an > interceptor that returns StreamRefused, but thought it might be useful in > grpc. > > This is something similar to what we plan to add for flow control purpose. > gRPC servers will have some knobs (e.g., ServerOption) to throttle the > resource usage (e.g., memory) of the entire server. > > > Cool, good to hear. > > > *Add facility for retries* > Currently, retries must happen in user-level code, but it'd be beneficial > for performance and robustness to do have a way to do this with GRPC. > Today, if the server refuses a request with StreamRefused, the client > doesn't have a way to retry on a different server, it can only just issue > the request and hope it gets a different server. It also forces the client > to reserialize the request which is unnecessary and given the cost of > serialization with proto, it'd be nice to avoid this. > > > > This is also something on our road map. > > > *Change behavior of Dial to not block on the balancer's initial list* > Currently, when you construct a *grpc.ClientConn with a balancer, the call > to Dial blocks until the initial set of servers is returned from the > balancer and errors if the balancer returns an empty list. This is > inconsistent with the behavior of the client when the balancer produces an > empty list later in the life of the client. > > > We propose changing the behavior such that Dial does not wait for the > response of the balancer and thus also can't return an error when the list > is empty. This not only makes the behavior consistent, it has the added > benefit that callers don't need to their own retries to Dial. > > > > If my memory works, this discussion happened before. The name "Dial" > indicates the dial operation needs to be triggered when it returns. We > probably can add another public surface like "NewClientConn" to achieve > what you want here. > > > Ah I see, that's why it waits. That makes sense. NewClientConn would be > great. > > > > > To reiterate, these are just rough ideas and we're also in search of other > solutions to these problems if you have ideas. > > Thanks! > > > > > > > > > -- > > > You received this message because you are subscribed to the Google Groups " > grpc.io" group. > > > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > > > To post to this group, send email to [email protected]. > > > Visit this group at https://groups.google.com/group/grpc-io. > > > To view this discussion on the web visit > https://groups.google.com/d/msgid/grpc-io/abaa9977-78ee-41d0-b0f5-a4e273dfd13a%40googlegroups.com > <https://groups.google.com/d/msgid/grpc-io/abaa9977-78ee-41d0-b0f5-a4e273dfd13a%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > > For more options, visit https://groups.google.com/d/optout. > > > > > > -- > Thanks, > -Qi > > > > > > > > > > > -- > > > You received this message because you are subscribed to the Google Groups " > grpc.io" group. > > > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > > > To post to this group, send email to [email protected]. > > > Visit this group at https://groups.google.com/group/grpc-io. > > > To view this discussion on the web visit > https://groups.google.com/d/msgid/grpc-io/CAFnDmdoGoqVt%2B_SOmQ5EMmaTpxaF1BFKtCAP0%3DvALCmDofeO4A%40mail.gmail.com > <https://groups.google.com/d/msgid/grpc-io/CAFnDmdoGoqVt%2B_SOmQ5EMmaTpxaF1BFKtCAP0%3DvALCmDofeO4A%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > > > > For more options, visit https://groups.google.com/d/optout. > > > > > > > > > -- > Thanks, > -Qi > > > > > > > > -- > Thanks, > -Qi > > > > > > > > -- > Thanks, > -Qi > > > > > > > > > > > -- > Thanks, > -Qi > > > -- You received this message because you are subscribed to the Google Groups "grpc.io" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/grpc-io. To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/CAAtHmWLxVLMd_aBzc_zr8yk%2ByUSRH-q1vBSzcMwR00vE8_-Vvw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
