[
https://issues.apache.org/jira/browse/KUDU-3358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
liu jing updated KUDU-3358:
---------------------------
Description:
h3. Description
When I use the k8s to manage the kudu's network, there is a problem that if the
k8s restart or any other way to change the kudu pod's ip, then kudu's tables
will fail to insert or scan.
h3. Make a reappear
There is a way to trigger the problem, using the impala to make a test.
1. First, the original k8s pod network like this figure1:
{panel:title=figure1}
service-kudu-test01-entry ClusterIP 10.98.78.224 <none>
8051/TCP,8050/TCP,7051/TCP,7050/TCP
2d22h service-kudu-test01-master-0 ClusterIP 10.109.78.49
<none> 7051/TCP,8051/TCP,20051/TCP
2d22h service-kudu-test01-master-1 ClusterIP 10.98.28.69
<none> 7051/TCP,8051/TCP,20051/TCP
2d22h service-kudu-test01-master-2 ClusterIP
10.105.180.113 <none> 7051/TCP,8051/TCP,20051/TCP
2d22h service-kudu-test01-{color:#ff0000}tserver-0
ClusterIP 10.106.224.20 <none> 7050/TCP,8050/TCP,20050/TCP
2d22h{color}
{color:#ff0000}service-kudu-test01-tserver-1 ClusterIP
10.110.69.131 <none> 7050/TCP,8050/TCP,20050/TCP
2d22h{color} service-kudu-test01-tserver-2
ClusterIP 10.108.30.59 <none> 7050/TCP,8050/TCP,20050/TCP
2d22h
{panel}
2. Second, using impala to create a table named *testTable.*
3. Then, restart the pod service, using the command:
{code:java}
kubectl delete --force -f ${dirname}/xx.yaml
kubectl apply --force -f ${dirname}/xx.yaml{code}
This will lead the kudu pod service to another new network, like this:
{panel:title=figure2}
service-kudu-test01-entry ClusterIP 10.108.85.55 <none>
8051/TCP,8050/TCP,7051/TCP,7050/TCP
2m22s
service-kudu-test01-master-0 ClusterIP 10.96.245.192 <none>
7051/TCP,8051/TCP,20051/TCP
2m22s
service-kudu-test01-master-1 ClusterIP 10.105.96.68 <none>
7051/TCP,8051/TCP,20051/TCP
2m22s
service-kudu-test01-master-2 ClusterIP 10.103.221.65 <none>
7051/TCP,8051/TCP,20051/TCP
2m22s
{color:#ff0000}service-kudu-test01-tserver-0 ClusterIP
10.101.128.27 <none> 7050/TCP,8050/TCP,20050/TCP
2m22s{color}
{color:#ff0000}service-kudu-test01-tserver-1 ClusterIP
10.111.9.225 <none> 7050/TCP,8050/TCP,20050/TCP
2m22s{color}
service-kudu-test01-tserver-2 ClusterIP 10.104.26.31 <none>
7050/TCP,8050/TCP,20050/TCP
2m22s
{panel}
4. Then, using the impala to scan the table {*}testTable{*}, like this:
{code:java}
select * from testTable
{code}
then, the impala client return a error, like this:
{code:java}
[service-impala-test01-server-0:21000] default> select * from testTable;
Query: select * from testTable
Query submitted at: 2022-03-07 15:13:04 (Coordinator:
http://service-impala-test01-server-0:25000)
Query progress can be monitored at:
http://service-impala-test01-server-0:25000/query_plan?query_id=c84e8a34795ca311:953d6fd800000000
ERROR: Unable to open scanner for node with id '0' for Kudu table
'impala::default.testTable': Timed out: exceeded configured scan timeout of
180.000s: after 3 scan attempts: Client connection negotiation failed: client
connection to 10.110.69.131:7050: Timeout exceeded waiting to connect: Network
error: Client connection negotiation failed: client connection to
10.106.224.20:7050: connect: Connection refused (error 111) {code}
>From this error log, we can find that kudu master return an old tserver ip to
>impala client(we can use *figure1* to check the ip) . But, this ip is not
>available, so impala fail to make a scan.
5. Depending on the new network, using the impala to create a new table
{*}testTable2{*}. It will succeed. But, if we use impala to make a insert or
select for the {*}testTable2{*}, it will return the same error log, like this:
{code:java}
ERROR: Unable to open scanner for node with id '0' for Kudu table
'impala::default.testTable2': Timed out: exceeded configured scan timeout of
180.000s: after 3 scan attempts: Client connection negotiation failed: client
connection to 10.110.69.131:7050: Timeout exceeded waiting to connect: Network
error: Client connection negotiation failed: client connection to
10.106.224.20:7050: connect: Connection refused (error 111) {code}
This indicates that the kudu master still uses the old network to manage the
new table.
h3. To avoid the problem
If I use the local machine's network for kudu, the problem will not happen
was:
h3. Description
When I use the k8s to manage the kudu's network, there is a problem that if the
k8s restart or any other way to change the kudu pod's ip, then kudu's tables
will fail to insert or scan.
h3. Make a reappear
There is a way to trigger the problem, using the impala to make a test.
1. First, the original k8s pod network like this figure1:
{panel:title=figure1}
service-kudu-test01-entry ClusterIP 10.98.78.224 <none>
8051/TCP,8050/TCP,7051/TCP,7050/TCP
2d22h service-kudu-test01-master-0 ClusterIP 10.109.78.49
<none> 7051/TCP,8051/TCP,20051/TCP
2d22h service-kudu-test01-master-1 ClusterIP 10.98.28.69
<none> 7051/TCP,8051/TCP,20051/TCP
2d22h service-kudu-test01-master-2 ClusterIP
10.105.180.113 <none> 7051/TCP,8051/TCP,20051/TCP
2d22h service-kudu-test01-{color:#FF0000}tserver-0
ClusterIP 10.106.224.20 <none> 7050/TCP,8050/TCP,20050/TCP
2d22h{color}
{color:#FF0000}service-kudu-test01-tserver-1 ClusterIP
10.110.69.131 <none> 7050/TCP,8050/TCP,20050/TCP
2d22h{color} service-kudu-test01-tserver-2
ClusterIP 10.108.30.59 <none> 7050/TCP,8050/TCP,20050/TCP
2d22h
{panel}
2. Second, using impala to create a table named *testTable.*
3. Then, restart the pod service, using the command:
{code:java}
kubectl delete --force -f ${dirname}/xx.yaml
kubectl apply --force -f ${dirname}/xx.yaml{code}
This will lead the kudu pod service to another new network, like this:
{panel:title=figure2}
service-kudu-test01-entry ClusterIP 10.108.85.55 <none>
8051/TCP,8050/TCP,7051/TCP,7050/TCP
2m22s
service-kudu-test01-master-0 ClusterIP 10.96.245.192 <none>
7051/TCP,8051/TCP,20051/TCP
2m22s
service-kudu-test01-master-1 ClusterIP 10.105.96.68 <none>
7051/TCP,8051/TCP,20051/TCP
2m22s
service-kudu-test01-master-2 ClusterIP 10.103.221.65 <none>
7051/TCP,8051/TCP,20051/TCP
2m22s
{color:#FF0000}service-kudu-test01-tserver-0 ClusterIP
10.101.128.27 <none> 7050/TCP,8050/TCP,20050/TCP
2m22s{color}
{color:#FF0000}service-kudu-test01-tserver-1 ClusterIP
10.111.9.225 <none> 7050/TCP,8050/TCP,20050/TCP
2m22s{color}
service-kudu-test01-tserver-2 ClusterIP 10.104.26.31 <none>
7050/TCP,8050/TCP,20050/TCP
2m22s
{panel}
4. Then, using the impala to scan the table {*}testTable{*}, like this:
{code:java}
select * from testTable
{code}
then, the impala client return a error, like this:
{code:java}
[service-impala-test01-server-0:21000] default> select * from testTable;
Query: select * from testTable
Query submitted at: 2022-03-07 15:13:04 (Coordinator:
http://service-impala-test01-server-0:25000)
Query progress can be monitored at:
http://service-impala-test01-server-0:25000/query_plan?query_id=c84e8a34795ca311:953d6fd800000000
ERROR: Unable to open scanner for node with id '0' for Kudu table
'impala::default.testTable': Timed out: exceeded configured scan timeout of
180.000s: after 3 scan attempts: Client connection negotiation failed: client
connection to 10.110.69.131:7050: Timeout exceeded waiting to connect: Network
error: Client connection negotiation failed: client connection to
10.106.224.20:7050: connect: Connection refused (error 111) {code}
>From this error log, we can find that kudu master return an old tserver ip to
>impala client(we can use *figure1* to check the ip) . But, this ip is not
>available, so impala fail to make a scan.
5. Depending on the new network, using the impala to create a new table
{*}testTable2{*}. It will succeed. But, if we use impala to make a insert or
select for the {*}testTable2{*}, it will return the same error log, like this:
{code:java}
ERROR: Unable to open scanner for node with id '0' for Kudu table
'impala::default.testTable2': Timed out: exceeded configured scan timeout of
180.000s: after 3 scan attempts: Client connection negotiation failed: client
connection to 10.110.69.131:7050: Timeout exceeded waiting to connect: Network
error: Client connection negotiation failed: client connection to
10.106.224.20:7050: connect: Connection refused (error 111) {code}
This indicates that the kudu master still uses the old network to manage the
new table.
h3. To avoid the problem
If I use the local machine's network for kudu, the problem will not happen
> kudu tables fail to insert and scan when k8s network changes
> ------------------------------------------------------------
>
> Key: KUDU-3358
> URL: https://issues.apache.org/jira/browse/KUDU-3358
> Project: Kudu
> Issue Type: Bug
> Affects Versions: 1.10.0
> Reporter: liu jing
> Priority: Major
>
> h3. Description
> When I use the k8s to manage the kudu's network, there is a problem that if
> the k8s restart or any other way to change the kudu pod's ip, then kudu's
> tables will fail to insert or scan.
> h3. Make a reappear
> There is a way to trigger the problem, using the impala to make a test.
> 1. First, the original k8s pod network like this figure1:
> {panel:title=figure1}
> service-kudu-test01-entry ClusterIP 10.98.78.224
> <none> 8051/TCP,8050/TCP,7051/TCP,7050/TCP
> 2d22h service-kudu-test01-master-0 ClusterIP
> 10.109.78.49 <none> 7051/TCP,8051/TCP,20051/TCP
> 2d22h service-kudu-test01-master-1
> ClusterIP 10.98.28.69 <none> 7051/TCP,8051/TCP,20051/TCP
> 2d22h service-kudu-test01-master-2
> ClusterIP 10.105.180.113 <none> 7051/TCP,8051/TCP,20051/TCP
> 2d22h
> service-kudu-test01-{color:#ff0000}tserver-0 ClusterIP
> 10.106.224.20 <none> 7050/TCP,8050/TCP,20050/TCP
> 2d22h{color}
> {color:#ff0000}service-kudu-test01-tserver-1 ClusterIP
> 10.110.69.131 <none> 7050/TCP,8050/TCP,20050/TCP
> 2d22h{color} service-kudu-test01-tserver-2
> ClusterIP 10.108.30.59 <none> 7050/TCP,8050/TCP,20050/TCP
> 2d22h
> {panel}
> 2. Second, using impala to create a table named *testTable.*
> 3. Then, restart the pod service, using the command:
> {code:java}
> kubectl delete --force -f ${dirname}/xx.yaml
> kubectl apply --force -f ${dirname}/xx.yaml{code}
> This will lead the kudu pod service to another new network, like this:
> {panel:title=figure2}
> service-kudu-test01-entry ClusterIP 10.108.85.55
> <none> 8051/TCP,8050/TCP,7051/TCP,7050/TCP
> 2m22s
> service-kudu-test01-master-0 ClusterIP 10.96.245.192
> <none> 7051/TCP,8051/TCP,20051/TCP
> 2m22s
> service-kudu-test01-master-1 ClusterIP 10.105.96.68
> <none> 7051/TCP,8051/TCP,20051/TCP
> 2m22s
> service-kudu-test01-master-2 ClusterIP 10.103.221.65
> <none> 7051/TCP,8051/TCP,20051/TCP
> 2m22s
> {color:#ff0000}service-kudu-test01-tserver-0 ClusterIP
> 10.101.128.27 <none> 7050/TCP,8050/TCP,20050/TCP
> 2m22s{color}
> {color:#ff0000}service-kudu-test01-tserver-1 ClusterIP
> 10.111.9.225 <none> 7050/TCP,8050/TCP,20050/TCP
> 2m22s{color}
> service-kudu-test01-tserver-2 ClusterIP 10.104.26.31
> <none> 7050/TCP,8050/TCP,20050/TCP
> 2m22s
> {panel}
> 4. Then, using the impala to scan the table {*}testTable{*}, like this:
> {code:java}
> select * from testTable
> {code}
> then, the impala client return a error, like this:
> {code:java}
> [service-impala-test01-server-0:21000] default> select * from testTable;
> Query: select * from testTable
> Query submitted at: 2022-03-07 15:13:04 (Coordinator:
> http://service-impala-test01-server-0:25000)
> Query progress can be monitored at:
> http://service-impala-test01-server-0:25000/query_plan?query_id=c84e8a34795ca311:953d6fd800000000
> ERROR: Unable to open scanner for node with id '0' for Kudu table
> 'impala::default.testTable': Timed out: exceeded configured scan timeout of
> 180.000s: after 3 scan attempts: Client connection negotiation failed: client
> connection to 10.110.69.131:7050: Timeout exceeded waiting to connect:
> Network error: Client connection negotiation failed: client connection to
> 10.106.224.20:7050: connect: Connection refused (error 111) {code}
> From this error log, we can find that kudu master return an old tserver ip to
> impala client(we can use *figure1* to check the ip) . But, this ip is not
> available, so impala fail to make a scan.
> 5. Depending on the new network, using the impala to create a new table
> {*}testTable2{*}. It will succeed. But, if we use impala to make a insert or
> select for the {*}testTable2{*}, it will return the same error log, like this:
> {code:java}
> ERROR: Unable to open scanner for node with id '0' for Kudu table
> 'impala::default.testTable2': Timed out: exceeded configured scan timeout of
> 180.000s: after 3 scan attempts: Client connection negotiation failed: client
> connection to 10.110.69.131:7050: Timeout exceeded waiting to connect:
> Network error: Client connection negotiation failed: client connection to
> 10.106.224.20:7050: connect: Connection refused (error 111) {code}
> This indicates that the kudu master still uses the old network to manage the
> new table.
> h3. To avoid the problem
> If I use the local machine's network for kudu, the problem will not happen
--
This message was sent by Atlassian Jira
(v8.20.1#820001)