[ 
https://issues.apache.org/jira/browse/KUDU-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LUOYAJUN updated KUDU-2090:
---------------------------
    Description: 
Insert operation occurs timeout, with the logs appear 'UpdateConsensus RPC'. 
The kudu cluster consists 3 masters and 8 tabletServers. There are 143 tables 
and 1115 tablets.

Some message of this issue in the TabletServer Log:

W0807 03:19:45.116417 20083 consensus_peers.cc:357] T 
5c0a1dbeeef04cc796d65746b5cda4dc P a8a23a2a3bb0446db77dcc85fc85530a -> Peer 
622a4488ce774290b2dcd3104a06ae3c (hadoop-04:7050): Couldn't send request to 
peer 622a4488ce774290b2dcd3104a06ae3c for tablet 
5c0a1dbeeef04cc796d65746b5cda4dc. Status: Timed out: UpdateConsensus RPC to 
10.20.110.4:7050 timed out after 1.000s (SENT). Retrying in the next heartbeat 
period. Already tried 6 times.
W0807 03:19:45.163341 20085 consensus_peers.cc:357] T 
bd89a18ccc0142d784942ecc130ff3b6 P a8a23a2a3bb0446db77dcc85fc85530a -> Peer 
cf66bf6093764bffa6387c241f9994c6 (hadoop-02:7050): Couldn't send request to 
peer cf66bf6093764bffa6387c241f9994c6 for tablet 
bd89a18ccc0142d784942ecc130ff3b6. Error code: TABLET_NOT_FOUND (6). Status: 
Timed out: UpdateConsensus RPC to 10.20.110.2:7050 timed out after 1.000s 
(SENT). Retrying in the next heartbeat period. Already tried 1 times.
W0807 03:19:45.320494 20083 consensus_peers.cc:357] T 
0b821119e2b849c38f981269da488fdc P a8a23a2a3bb0446db77dcc85fc85530a -> Peer 
622a4488ce774290b2dcd3104a06ae3c (hadoop-04:7050): Couldn't send request to 
peer 622a4488ce774290b2dcd3104a06ae3c for tablet 
0b821119e2b849c38f981269da488fdc. Error code: TABLET_NOT_FOUND (6). Status: 
Timed out: UpdateConsensus RPC to 10.20.110.4:7050 timed out after 1.000s 
(SENT). Retrying in the next heartbeat period. Already tried 7 times.
W0807 03:19:45.320538 20083 consensus_peers.cc:357] T 
8471841aa0114924868cfdf596e9bf95 P a8a23a2a3bb0446db77dcc85fc85530a -> Peer 
c0556f4e50a34b04b9f4b1ffc63f3ffb (hadoop-03:7050): Couldn't send request to 
peer c0556f4e50a34b04b9f4b1ffc63f3ffb for tablet 
8471841aa0114924868cfdf596e9bf95. Status: Timed out: UpdateConsensus RPC to 
10.20.110.3:7050 timed out after 1.000s (SENT). Retrying in the next heartbeat 
period. Already tried 7 times.


  was:
Insert operation occurs timeout, with the logs appear 'UpdateConsensus RPC'. 
The kudu cluster consists 3 masters and 8 tabletServers. There are 143 tables 
and 1115 tablets.

Our engineer suspects that there are too many partitions, and ready to do data 
migration. 

But we see that Kudu recommends to limit the number of tablets per server to 
100 or fewer, with giving the scale 'Recommended maximum number of tablets per 
tablet server is 1000, post-replication'. It's confusing.

Some message of this issue in the TabletServer Log:

W0807 03:19:45.116417 20083 consensus_peers.cc:357] T 
5c0a1dbeeef04cc796d65746b5cda4dc P a8a23a2a3bb0446db77dcc85fc85530a -> Peer 
622a4488ce774290b2dcd3104a06ae3c (hadoop-04:7050): Couldn't send request to 
peer 622a4488ce774290b2dcd3104a06ae3c for tablet 
5c0a1dbeeef04cc796d65746b5cda4dc. Status: Timed out: UpdateConsensus RPC to 
10.20.110.4:7050 timed out after 1.000s (SENT). Retrying in the next heartbeat 
period. Already tried 6 times.
W0807 03:19:45.163341 20085 consensus_peers.cc:357] T 
bd89a18ccc0142d784942ecc130ff3b6 P a8a23a2a3bb0446db77dcc85fc85530a -> Peer 
cf66bf6093764bffa6387c241f9994c6 (hadoop-02:7050): Couldn't send request to 
peer cf66bf6093764bffa6387c241f9994c6 for tablet 
bd89a18ccc0142d784942ecc130ff3b6. Error code: TABLET_NOT_FOUND (6). Status: 
Timed out: UpdateConsensus RPC to 10.20.110.2:7050 timed out after 1.000s 
(SENT). Retrying in the next heartbeat period. Already tried 1 times.
W0807 03:19:45.320494 20083 consensus_peers.cc:357] T 
0b821119e2b849c38f981269da488fdc P a8a23a2a3bb0446db77dcc85fc85530a -> Peer 
622a4488ce774290b2dcd3104a06ae3c (hadoop-04:7050): Couldn't send request to 
peer 622a4488ce774290b2dcd3104a06ae3c for tablet 
0b821119e2b849c38f981269da488fdc. Error code: TABLET_NOT_FOUND (6). Status: 
Timed out: UpdateConsensus RPC to 10.20.110.4:7050 timed out after 1.000s 
(SENT). Retrying in the next heartbeat period. Already tried 7 times.
W0807 03:19:45.320538 20083 consensus_peers.cc:357] T 
8471841aa0114924868cfdf596e9bf95 P a8a23a2a3bb0446db77dcc85fc85530a -> Peer 
c0556f4e50a34b04b9f4b1ffc63f3ffb (hadoop-03:7050): Couldn't send request to 
peer c0556f4e50a34b04b9f4b1ffc63f3ffb for tablet 
8471841aa0114924868cfdf596e9bf95. Status: Timed out: UpdateConsensus RPC to 
10.20.110.3:7050 timed out after 1.000s (SENT). Retrying in the next heartbeat 
period. Already tried 7 times.



> TABLET_NOT_FOUND (6). Status: Timed out: UpdateConsensus RPC
> ------------------------------------------------------------
>
>                 Key: KUDU-2090
>                 URL: https://issues.apache.org/jira/browse/KUDU-2090
>             Project: Kudu
>          Issue Type: Bug
>          Components: tablet
>    Affects Versions: 1.3.0
>         Environment: KUDU-1.3.0-1.cdh5.11.0.p0.12,CentOS Linux release 
> 7.3.1611 (Core)
>            Reporter: LUOYAJUN
>         Attachments: kudu-tserver.WARNING
>
>
> Insert operation occurs timeout, with the logs appear 'UpdateConsensus RPC'. 
> The kudu cluster consists 3 masters and 8 tabletServers. There are 143 tables 
> and 1115 tablets.
> Some message of this issue in the TabletServer Log:
> W0807 03:19:45.116417 20083 consensus_peers.cc:357] T 
> 5c0a1dbeeef04cc796d65746b5cda4dc P a8a23a2a3bb0446db77dcc85fc85530a -> Peer 
> 622a4488ce774290b2dcd3104a06ae3c (hadoop-04:7050): Couldn't send request to 
> peer 622a4488ce774290b2dcd3104a06ae3c for tablet 
> 5c0a1dbeeef04cc796d65746b5cda4dc. Status: Timed out: UpdateConsensus RPC to 
> 10.20.110.4:7050 timed out after 1.000s (SENT). Retrying in the next 
> heartbeat period. Already tried 6 times.
> W0807 03:19:45.163341 20085 consensus_peers.cc:357] T 
> bd89a18ccc0142d784942ecc130ff3b6 P a8a23a2a3bb0446db77dcc85fc85530a -> Peer 
> cf66bf6093764bffa6387c241f9994c6 (hadoop-02:7050): Couldn't send request to 
> peer cf66bf6093764bffa6387c241f9994c6 for tablet 
> bd89a18ccc0142d784942ecc130ff3b6. Error code: TABLET_NOT_FOUND (6). Status: 
> Timed out: UpdateConsensus RPC to 10.20.110.2:7050 timed out after 1.000s 
> (SENT). Retrying in the next heartbeat period. Already tried 1 times.
> W0807 03:19:45.320494 20083 consensus_peers.cc:357] T 
> 0b821119e2b849c38f981269da488fdc P a8a23a2a3bb0446db77dcc85fc85530a -> Peer 
> 622a4488ce774290b2dcd3104a06ae3c (hadoop-04:7050): Couldn't send request to 
> peer 622a4488ce774290b2dcd3104a06ae3c for tablet 
> 0b821119e2b849c38f981269da488fdc. Error code: TABLET_NOT_FOUND (6). Status: 
> Timed out: UpdateConsensus RPC to 10.20.110.4:7050 timed out after 1.000s 
> (SENT). Retrying in the next heartbeat period. Already tried 7 times.
> W0807 03:19:45.320538 20083 consensus_peers.cc:357] T 
> 8471841aa0114924868cfdf596e9bf95 P a8a23a2a3bb0446db77dcc85fc85530a -> Peer 
> c0556f4e50a34b04b9f4b1ffc63f3ffb (hadoop-03:7050): Couldn't send request to 
> peer c0556f4e50a34b04b9f4b1ffc63f3ffb for tablet 
> 8471841aa0114924868cfdf596e9bf95. Status: Timed out: UpdateConsensus RPC to 
> 10.20.110.3:7050 timed out after 1.000s (SENT). Retrying in the next 
> heartbeat period. Already tried 7 times.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to