[ 
https://issues.apache.org/jira/browse/KUDU-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16233637#comment-16233637
 ] 

ZhangZhen commented on KUDU-2206:
---------------------------------

Hi Adar, the tserver 07 has 344 tablets running on it and as this is a staging 
cluster, most of the tablets are empty, only 24 tablet has data and each use 
about 40M disk size. And yes, this is a release 1.3.0 version of Kudu.
I just reproduce this issue and at the same time run pstack on tserver 07 10 
times with 1 second between them, you can find the jstack logs in the 
attachment.
And sorry, I failed to use /tracing.html tool to get more clues, after choose 
the maintenance category and click the record button, it seems nothing is 
recorded. I see nothing after I click the finish. Is there some wiki about 
using this tool?




> Kudu client create table timeout
> --------------------------------
>
>                 Key: KUDU-2206
>                 URL: https://issues.apache.org/jira/browse/KUDU-2206
>             Project: Kudu
>          Issue Type: Bug
>    Affects Versions: 1.3.0
>            Reporter: ZhangZhen
>            Priority: Major
>         Attachments: kudu_master.log, pstack.zip, tserver_01_0f53a0d3.log, 
> tserver_07_23f962e4a1.log, tsever_02_0a8bbcbb.log
>
>
> We encountered rpc timeout exception when we use sparksql, which use Java 
> kudu client innerly, to create table on kudu cluster. The cluster has 10 
> tserver and 1 master on 10 machines, the target table has 10 range partitions 
> and 5 hash partitions. 
> From the web UI, I found it spent about 3 minutes before all the tablets vote 
> a leader, and I can see a lot delete tablet records in the UI like:
> Delete Tablet Running 2.13 min        719f0f496bc34a469e4069b2861b4be8 Delete 
> Tablet RPC for TS=044f1da9a27c46acb82b1386f829f4dc
> Also I find many retry records in tserver logs, like:
> W1031 23:04:40.088256  5816 consensus_peers.cc:357] T 
> fcde65c4e4cf4df29b9ef9884ce292b2 P 0f53a0d3ef7e44ebb0365c800752d5bd -> Peer 
> 23f962e4a1744381ad5fa0d2d8b10241 (c3-kudu-tst-st07.bj:18700): Couldn't send 
> request to peer 23f962e4a1744381ad5fa0d2d8b10241 for tablet 
> fcde65c4e4cf4df29b9ef9884ce292b2. Error code: TABLET_NOT_RUNNING (12). 
> Status: Illegal state: Tablet not RUNNING: NOT_STARTED. Retrying in the next 
> heartbeat period. Already tried 94 times.
> You can find the logs of master and tserver since master receive the create 
> table request in the attachment.
> The kudu version is 1.3.0, the nearest commit is 
> 00813f96b9cb0c9ec57a17e5c85242f7679db0e0
> The exception that client received is like:
> Error: org.apache.kudu.client.NonRecoverableException: RPC can not complete 
> before timeout: KuduRpc(method=IsCreateTableDone, tablet=null, attempt=25, 
> DeadlineTracker(timeout=30000, elapsed=28499), Traces: [0ms] sending RPC to 
> server , [0ms] received from server  response OK, [20ms] sending RPC to 
> server , [20ms] received from server  response OK, [40ms] sending RPC to 
> server , [40ms] received from server  response OK, [59ms] sending RPC to 
> server , [60ms] received from server  response OK, [80ms] sending RPC to 
> server , [80ms] received from server  response OK, [100ms] sending RPC to 
> server , [100ms] received from server  response OK, [140ms] sending RPC to 
> server , [141ms] received from server  response OK, [200ms] sending RPC to 
> server , [200ms] received from server  response OK, [319ms] sending RPC to 
> server , [320ms] received from server  response OK, [780ms] sending RPC to 
> server , [780ms] received from server  response OK, [2740ms] sending RPC to 
> server , [2741ms] received from server  response OK, [3580ms] sending RPC to 
> server , [3580ms] received from server  response OK, [4840ms] sending RPC to 
> server , [4840ms] received from server  response OK, [7080ms] sending RPC to 
> server , [7081ms] received from server  response OK, [8320ms] sending RPC to 
> server , [8321ms] received from server  response OK, [11620ms] sending RPC to 
> server , [11621ms] received from server  response OK, [13540ms] sending RPC 
> to server , [13540ms] received from server  response OK, [16819ms] sending 
> RPC to server , [16820ms] received from server  response OK, [19020ms] 
> sending RPC to server , [19020ms] received from server  response OK, 
> [21340ms] sending RPC to server , [21341ms] received from server  response 
> OK, [24660ms] sending RPC to server , [24661ms] received from server  
> response OK, [26800ms] sending RPC to server , [26800ms] received from server 
>  response OK, [27660ms] sending RPC to server , [27660ms] received from 
> server  response OK, [28480ms] sending RPC to server , [28481ms] received 
> from server



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to