liu jing created KUDU-3358:
------------------------------
Summary: kudu tables fail to insert and scan when k8s network
changes
Key: KUDU-3358
URL: https://issues.apache.org/jira/browse/KUDU-3358
Project: Kudu
Issue Type: Bug
Affects Versions: 1.10.0
Reporter: liu jing
h3. Description
When I use the k8s to manage the kudu's network, there is a problem that if the
k8s restart or any other way to change the kudu pod's ip, then kudu's tables
will fail to insert or scan.
h3. Make a reappear
There is a way to trigger the problem, using the impala to make a test.
1. First, the original k8s pod network like this figure1:
{panel:title=figure1}
service-kudu-test01-entry ClusterIP 10.98.78.224 <none>
8051/TCP,8050/TCP,7051/TCP,7050/TCP
2d22h service-kudu-test01-master-0 ClusterIP 10.109.78.49
<none> 7051/TCP,8051/TCP,20051/TCP
2d22h service-kudu-test01-master-1 ClusterIP 10.98.28.69
<none> 7051/TCP,8051/TCP,20051/TCP
2d22h service-kudu-test01-master-2 ClusterIP
10.105.180.113 <none> 7051/TCP,8051/TCP,20051/TCP
2d22h service-kudu-test01-{color:#FF0000}tserver-0
ClusterIP 10.106.224.20 <none> 7050/TCP,8050/TCP,20050/TCP
2d22h{color}
{color:#FF0000}service-kudu-test01-tserver-1 ClusterIP
10.110.69.131 <none> 7050/TCP,8050/TCP,20050/TCP
2d22h{color} service-kudu-test01-tserver-2
ClusterIP 10.108.30.59 <none> 7050/TCP,8050/TCP,20050/TCP
2d22h
{panel}
2. Second, using impala to create a table named *testTable.*
3. Then, restart the pod service, using the command:
{code:java}
kubectl delete --force -f ${dirname}/xx.yaml
kubectl apply --force -f ${dirname}/xx.yaml{code}
This will lead the kudu pod service to another new network, like this:
{panel:title=figure2}
service-kudu-test01-entry ClusterIP 10.108.85.55 <none>
8051/TCP,8050/TCP,7051/TCP,7050/TCP
2m22s
service-kudu-test01-master-0 ClusterIP 10.96.245.192 <none>
7051/TCP,8051/TCP,20051/TCP
2m22s
service-kudu-test01-master-1 ClusterIP 10.105.96.68 <none>
7051/TCP,8051/TCP,20051/TCP
2m22s
service-kudu-test01-master-2 ClusterIP 10.103.221.65 <none>
7051/TCP,8051/TCP,20051/TCP
2m22s
{color:#FF0000}service-kudu-test01-tserver-0 ClusterIP
10.101.128.27 <none> 7050/TCP,8050/TCP,20050/TCP
2m22s{color}
{color:#FF0000}service-kudu-test01-tserver-1 ClusterIP
10.111.9.225 <none> 7050/TCP,8050/TCP,20050/TCP
2m22s{color}
service-kudu-test01-tserver-2 ClusterIP 10.104.26.31 <none>
7050/TCP,8050/TCP,20050/TCP
2m22s
{panel}
4. Then, using the impala to scan the table {*}testTable{*}, like this:
{code:java}
select * from testTable
{code}
then, the impala client return a error, like this:
{code:java}
[service-impala-test01-server-0:21000] default> select * from testTable;
Query: select * from testTable
Query submitted at: 2022-03-07 15:13:04 (Coordinator:
http://service-impala-test01-server-0:25000)
Query progress can be monitored at:
http://service-impala-test01-server-0:25000/query_plan?query_id=c84e8a34795ca311:953d6fd800000000
ERROR: Unable to open scanner for node with id '0' for Kudu table
'impala::default.testTable': Timed out: exceeded configured scan timeout of
180.000s: after 3 scan attempts: Client connection negotiation failed: client
connection to 10.110.69.131:7050: Timeout exceeded waiting to connect: Network
error: Client connection negotiation failed: client connection to
10.106.224.20:7050: connect: Connection refused (error 111) {code}
>From this error log, we can find that kudu master return an old tserver ip to
>impala client(we can use *figure1* to check the ip) . But, this ip is not
>available, so impala fail to make a scan.
5. Depending on the new network, using the impala to create a new table
{*}testTable2{*}. It will succeed. But, if we use impala to make a insert or
select for the {*}testTable2{*}, it will return the same error log, like this:
{code:java}
ERROR: Unable to open scanner for node with id '0' for Kudu table
'impala::default.testTable2': Timed out: exceeded configured scan timeout of
180.000s: after 3 scan attempts: Client connection negotiation failed: client
connection to 10.110.69.131:7050: Timeout exceeded waiting to connect: Network
error: Client connection negotiation failed: client connection to
10.106.224.20:7050: connect: Connection refused (error 111) {code}
This indicates that the kudu master still uses the old network to manage the
new table.
h3. To avoid the problem
If I use the local machine's network for kudu, the problem will not happen
--
This message was sent by Atlassian Jira
(v8.20.1#820001)