Also note https://github.com/grpc/proposal/blob/master/A26-grpclb-selection.md - this relates to what Yijie noted about how you'll need to configure the client to use the grpclb policy.
On Monday, February 6, 2023 at 11:51:40 AM UTC-8 Yijie Ma wrote: > The `round_robin` policy does not use SRV record or `balancer_addresses`. > Only `grpclb` policy utilizes that so you need to configure to use that > policy. See: > https://github.com/grpc/proposal/blob/master/A5-grpclb-in-dns.md. > > Also the current DNS resolver might have a limitation that if there are no > A or AAAA records, it will return an error instead of returning an empty > address list, so it might not be actually possible to use `grpclb` > configured via DNS without having at least one fallback address. > > On Monday, February 6, 2023 at 12:49:18 AM UTC-8 Alexandre BECQUET wrote: > >> Hello, did you find the source of the problem ? >> >> I have the same error : "empty address list". But there is no error with >> network or dns. >> >> Le jeudi 26 janvier 2023 à 17:53:57 UTC+1, Thomas Sanchez a écrit : >> >>> Hello folks, >>> I'm currently experimenting with the built in load balancer support, >>> specifically: >>> >>> >>> https://github.com/grpc/proposal/blob/ce4fe27b88eec315ec0d5fe1d40bb795c0ee94b1/A5-grpclb-in-dns.md >>> >>> I'm encountering an error and before creating a bug report I'd like to >>> make sure >>> I've not missed anything. >>> >>> We've setup the DNS according to the doc: >>> >>> $> dig +noall +answer SRV _grpclb._tcp.server._ >>> tcp.internal.example.com >>> _grpclb._tcp.server._tcp.internal.example.com. 120 IN SRV 0 0 2379 >>> nodes.internal.example.com. >>> >>> $> dig +noall +answer A nodes.internal.example.com >>> nodes.internal.example.com. 120 IN A 10.10.44.90 >>> nodes.internal.example.com. 120 IN A 10.10.19.222 >>> nodes.internal.example.com. 120 IN A 10.10.3.170 >>> >>> Using a Python client (3.10, grpcio 1.51.1): >>> >>> # [...] >>> channel = grpc.secure_channel( >>> "server._tcp.internal.example.com", >>> credentials=grpc.ssl_channel_credentials( >>> ca_cert_file, >>> cert_key_file, >>> cert_cert_file, >>> ), >>> options=( >>> ("grpc.dns_enable_srv_queries", 1), >>> ("grpc.lb_policy_name", "round_robin"), >>> ), >>> ) >>> server = rpc.ServerStub(channel) >>> req = rpcpb.StatusRequest() >>> server.Status(rep) >>> >>> When I execute this code, I get the following exception: >>> >>> Traceback (most recent call last): >>> File "/tmp/test.py", line 57, in <module> >>> server.Status(streq) >>> File "/....../grpc/_channel.py", line 946, in __call__ >>> return _end_unary_response_blocking(state, call, False, None) >>> File "/....../grpc/_channel.py", line 849, in >>> _end_unary_response_blocking >>> raise _InactiveRpcError(state) >>> grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that >>> terminated with: >>> status = StatusCode.UNAVAILABLE >>> details = "empty address list: " >>> debug_error_string = "UNKNOWN:empty address list: >>> {created_time:"2023-01-26T17:10:23.423558027+01:00", grpc_status:14}" >>> >>> When I enable the gRPC logs, I can see that ares properly resolves the >>> addresses: >>> >>> D0126 17:10:23.355144647 1897428 grpc_ares_wrapper.cc:726] >>> (c-ares resolver) request:0x888c20 on_srv_query_done_locked >>> name=_grpclb._tcp.server._tcp.internal.example.com ARES_SUCCESS >>> D0126 17:10:23.355170775 1897428 grpc_ares_wrapper.cc:731] >>> (c-ares resolver) request:0x888c20 ares_parse_srv_reply: 0 >>> D0126 17:10:23.355175243 1897428 grpc_ares_wrapper.cc:625] >>> (c-ares resolver) request:0x888c20 create_hostbyname_request_locked host: >>> nodes.internal.example.com port:19209 is_balancer:1 qtype:AAAA >>> D0126 17:10:23.355270526 1897428 grpc_ares_wrapper.cc:625] >>> (c-ares resolver) request:0x888c20 create_hostbyname_request_locked host: >>> nodes.internal.example.com port:19209 is_balancer:1 qtype:A >>> D0126 17:10:23.355363205 1897428 grpc_ares_wrapper.cc:191] >>> (c-ares resolver) request:0x888c20 Ref ev_driver 0x888d00 >>> D0126 17:10:23.355368326 1897428 grpc_ares_wrapper.cc:439] >>> (c-ares resolver) request:0x888c20 notify read on: c-ares fd: 6 >>> D0126 17:10:23.355388528 1897428 grpc_ares_wrapper.cc:202] >>> (c-ares resolver) request:0x888c20 Unref ev_driver 0x888d00 >>> D0126 17:10:23.422973484 1897428 grpc_ares_wrapper.cc:364] >>> (c-ares resolver) request:0x888c20 readable on c-ares fd: 6 >>> D0126 17:10:23.423048620 1897428 grpc_ares_wrapper.cc:710] >>> (c-ares resolver) request:0x888c20 on_hostbyname_done_locked: C-ares >>> status is not ARES_SUCCESS qtype=AAAA name=nodes.internal.example.com >>> is_balancer=1: DNS server returned answer with no data >>> D0126 17:10:23.423217106 1897428 grpc_ares_wrapper.cc:655] >>> (c-ares resolver) request:0x888c20 on_hostbyname_done_locked qtype=A host= >>> nodes.internal.example.com ARES_SUCCESS >>> D0126 17:10:23.423241972 1897428 grpc_ares_wrapper.cc:698] >>> (c-ares resolver) request:0x888c20 c-ares resolver gets a AF_INET result: >>> addr: 10.10.44.90 >>> port: 2379 >>> >>> D0126 17:10:23.423245904 1897428 grpc_ares_wrapper.cc:698] >>> (c-ares resolver) request:0x888c20 c-ares resolver gets a AF_INET result: >>> addr: 10.10.3.170 >>> port: 2379 >>> >>> D0126 17:10:23.423249513 1897428 grpc_ares_wrapper.cc:698] >>> (c-ares resolver) request:0x888c20 c-ares resolver gets a AF_INET result: >>> addr: 10.10.19.222 >>> port: 2379 >>> >>> D0126 17:10:23.423252890 1897428 grpc_ares_wrapper.cc:202] >>> (c-ares resolver) request:0x888c20 Unref ev_driver 0x888d00 >>> D0126 17:10:23.423261751 1897428 grpc_ares_wrapper.cc:216] >>> (c-ares resolver) request:0x888c20 delete fd: c-ares fd: 6 >>> D0126 17:10:23.423266242 1897428 grpc_ares_wrapper.cc:202] >>> (c-ares resolver) request:0x888c20 Unref ev_driver 0x888d00 >>> D0126 17:10:23.423270617 1897428 grpc_ares_wrapper.cc:293] >>> (c-ares resolver) request:0x888c20 ev_driver=0x888d00 on_timeout_locked. >>> driver->shutting_down=1. err=CANCELLED >>> D0126 17:10:23.423272866 1897428 grpc_ares_wrapper.cc:202] >>> (c-ares resolver) request:0x888c20 Unref ev_driver 0x888d00 >>> D0126 17:10:23.423275996 1897428 grpc_ares_wrapper.cc:318] >>> (c-ares resolver) request:0x888c20 ev_driver=0x888d00 >>> on_ares_backup_poll_alarm_locked. driver->shutting_down=1. err=CANCELLED >>> D0126 17:10:23.423278521 1897428 grpc_ares_wrapper.cc:202] >>> (c-ares resolver) request:0x888c20 Unref ev_driver 0x888d00 >>> D0126 17:10:23.423303740 1897428 grpc_ares_wrapper.cc:205] >>> (c-ares resolver) request:0x888c20 destroy ev_driver 0x888d00 >>> I0126 17:10:23.423316475 1897428 grpc_ares_wrapper.cc:552] >>> (c-ares resolver) request:0x888c20 c-ares address sorting: input[0]= >>> 10.10.44.90:2379 >>> I0126 17:10:23.423320669 1897428 grpc_ares_wrapper.cc:552] >>> (c-ares resolver) request:0x888c20 c-ares address sorting: input[1]= >>> 10.10.3.170:2379 >>> I0126 17:10:23.423323482 1897428 grpc_ares_wrapper.cc:552] >>> (c-ares resolver) request:0x888c20 c-ares address sorting: input[2]= >>> 10.10.19.222:2379 >>> I0126 17:10:23.423375716 1897428 grpc_ares_wrapper.cc:552] >>> (c-ares resolver) request:0x888c20 c-ares address sorting: output[0]= >>> 10.10.44.90:2379 >>> I0126 17:10:23.423378492 1897428 grpc_ares_wrapper.cc:552] >>> (c-ares resolver) request:0x888c20 c-ares address sorting: output[1]= >>> 10.10.3.170:2379 >>> I0126 17:10:23.423380187 1897428 grpc_ares_wrapper.cc:552] >>> (c-ares resolver) request:0x888c20 c-ares address sorting: output[2]= >>> 10.10.19.222:2379 >>> D0126 17:10:23.423382614 1897428 dns_resolver_ares.cc:396] >>> (c-ares resolver) resolver:0x69ed30 OnResolved() proceeding >>> I0126 17:10:23.423409387 1897428 polling_resolver.cc:137] >>> [polling resolver 0x8d7270] request complete >>> I0126 17:10:23.423413325 1897428 polling_resolver.cc:142] >>> [polling resolver 0x8d7270] returning result: addresses=<0 addresses>, >>> service_config=<null> >>> >>> As we can see in the last line we return 0 addresses instead of the >>> three. >>> When checking the code, dns_resolver_ares.cc, I wonder whether the code >>> handling the >>> `balancer_addresses_` is missing: >>> >>> - We should have only balancer addresses so we init `result.addresses` >>> with an >>> empty list: `ServerAddressList`: >>> - >>> https://github.com/grpc/grpc/blob/f99b8b5bc4a89d37f86bf063999ec22e44312663/src/core/ext/filters/client_channel/resolver/dns/c_ares/dns_resolver_ares.cc#L405 >>> - Further down, after any config handling if >>> present, we can see that addresses is not written, but instead >>> `result.arg` is >>> set with the addresses we resolved and put under the >>> `grpc.grpclb_balancer_addresses` key whereas the policy expect it in >>> the >>> addresses field >>> - >>> https://github.com/grpc/grpc/blob/f99b8b5bc4a89d37f86bf063999ec22e44312663/src/core/ext/filters/client_channel/resolver/dns/c_ares/dns_resolver_ares.cc#L428 >>> - >>> https://github.com/grpc/grpc/blob/50fbe51238e8932b180e5e18eaa9ab87d26ecc10/src/core/ext/filters/client_channel/lb_policy/grpclb/grpclb_balancer_addresses.cc#L82 >>> - >>> https://github.com/grpc/grpc/blob/v1.51.1/src/core/ext/filters/client_channel/lb_policy/round_robin/round_robin.cc#L274 >>> >>> >>> Here are the next logs following the polling_resolver: >>> >>> I0126 17:10:23.423448035 1897428 round_robin.cc:239] >>> [RR 0x8da3f0] Created >>> I0126 17:10:23.423454769 1897428 round_robin.cc:271] >>> [RR 0x8da3f0] received update with 0 addresses >>> I0126 17:10:23.423458411 1897428 subchannel_list.h:367] >>> [RoundRobinSubchannelList 0x8da3f0] Creating subchannel list 0x8c3f40 >>> for 0 subchannels >>> I0126 17:10:23.423471480 1897428 channel_stack.cc:114] >>> CHANNEL_STACK: init DynamicFilters >>> I0126 17:10:23.423474992 1897428 channel_stack.cc:116] >>> CHANNEL_STACK: filter retry_filter >>> I0126 17:10:23.423491810 1897428 client_channel.cc:2169] >>> chand=0x908060 calld=0x950880: applying service config to call >>> I0126 17:10:23.423495972 1897428 client_channel.cc:2140] >>> chand=0x908060 calld=0x950880: removing from resolver queued picks list >>> I0126 17:10:23.423501202 1897428 polling_resolver.cc:172] >>> [polling resolver 0x8d7270] result status from channel: UNAVAILABLE: >>> empty address list: >>> I0126 17:10:23.423505125 1897428 polling_resolver.cc:197] >>> [polling resolver 0x8d7270] retrying in 1000 ms >>> I0126 17:10:23.423511785 1897428 client_channel.cc:2372] >>> chand=0x908060 calld=0x950880: creating dynamic call stack on >>> channel_stack=0x90dc20 >>> I0126 17:10:23.423519385 1897428 client_channel.cc:2068] >>> chand=0x908060 calld=0x950880: starting 1 pending batches on >>> dynamic_call=0x9512b0 >>> >>> >>> Is there anything I'm missing? >>> >>> Thanks a lot, >>> Thomas Sanchez >>> >>> -- You received this message because you are subscribed to the Google Groups "grpc.io" group. To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/7a4b0480-1352-41ac-b338-28cca471b5can%40googlegroups.com.