[ 
https://issues.apache.org/jira/browse/KUDU-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved KUDU-2329.
-------------------------------
       Resolution: Incomplete
    Fix Version/s: n/a

> Random RPC timeout errors when inserting rows in a Kudu table
> -------------------------------------------------------------
>
>                 Key: KUDU-2329
>                 URL: https://issues.apache.org/jira/browse/KUDU-2329
>             Project: Kudu
>          Issue Type: Bug
>          Components: rpc, server
>    Affects Versions: 1.5.0
>            Reporter: Héctor Gutiérrez
>            Priority: Major
>             Fix For: n/a
>
>
> When executing inserts into a Kudu table, we are experiencing errors at 
> random times. The first time we found one of these errors was during a bulk 
> update of a Kudu table via Spark (in Scala):
> {{kuduContext.updateRows(dataFrame, "table_name")}}
> The error message in Spark was the following:
> {{java.lang.RuntimeException: failed to write 579 rows from DataFrame to 
> Kudu; sample errors: Timed out: can not complete before timeout: Batch
> {operations=6, tablet="cd1e33fce0114c9bbd9c14e2559e7591" [0x0000000F, 
> 0x00000010), ignoreAllDuplicateRows=false, rpc=KuduRpc(method=Write, 
> tablet=cd1e33fce0114c9bbd9c14e2559e7591, attempt=3, 
> DeadlineTracker(timeout=30000, elapsed=30090), Traces: [0ms] sending RPC to 
> server 6f273933b4d5498e87aadfb99b054a21, [10011ms] received from server 
> 6f273933b4d5498e87aadfb99b054a21 response Network error: [peer 
> 6f273933b4d5498e87aadfb99b054a21] encountered a read timeout; closing the 
> channel, [10011ms] delaying RPC due to Network error: [peer 
> 6f273933b4d5498e87aadfb99b054a21] encountered a read timeout; closing the 
> channel, [10033ms] sending RPC to server 6f273933b4d5498e87aadfb99b054a21, 
> [20050ms] received from server 6f273933b4d5498e87aadfb99b054a21 response 
> Network error: [peer 6f273933b4d5498e87aadfb99b054a21] encountered a read 
> timeout; closing the channel, [20050ms] delaying RPC due to Network error: 
> [peer 6f273933b4d5498e87aadfb99b054a21] encountered a read timeout; closing 
> the channel, [20072ms] sending RPC to server 
> 6f273933b4d5498e87aadfb99b054a21, [30090ms] received from server 
> 6f273933b4d5498e87aadfb99b054a21 response Network error: [peer 
> 6f273933b4d5498e87aadfb99b054a21] encountered a read timeout; closing the 
> channel, [30090ms] delaying RPC due to Network error: [peer 
> 6f273933b4d5498e87aadfb99b054a21] encountered a read timeout; closing the 
> channel)}
> }}
> (+ 4 more errors similar to this one in the error message)
> We first thought it was actually a problem with our Spark code, but when we 
> tried to execute a simple "INSERT INTO" query from the impala shell into a 
> Kudu table, we got the following error:
> {{[.............................] > insert into test_kudu values (282, 
> 'hola');}}
> {{ \{{ Query: insert into test_kudu values (282, 'hola')}}}}
> {{ \{{ Query submitted at: ......................}}}}
> {{ \{{ Query progress can be monitored at: ........................}}}}
> {{ \{{ WARNINGS: Kudu error(s) reported, first error: Timed out: Failed to 
> write batch of 1 ops to tablet 9c295e90811e483a9550bfd75abcf666 after 1 
> attempt(s): Failed to write to server: 071bcafbb1644678a697c474662047b7 
> (.........................:7050): Write RPC to ....................:7050 
> timed out after 179.949s (SENT)}}}}
> {{Error in Kudu table 'impala:kudu_db.test_kudu': Timed out: Failed to write 
> batch of 1 ops to tablet 9c295e90811e483a9550bfd75abcf666 after 1 attempt(s): 
> Failed to write to server: 071bcafbb1644678a697c474662047b7 
> (...........................:7050): Write RPC to ......................:7050 
> timed out after 179.949s (SENT)}}
> To make things even more confusing, despite getting this error in the impala 
> shell, after a while (and not immediately), the inserted rows ended up in the 
> table, so somehow they were actually inserted.
> We also tried tweaking the Kudu timeout configuration values that we had 
> previously set, but it didn't solve anything and the problem kept appearing.
> Furthermore, we don't always get these errors, they only appear at random 
> times. For example, right now we're just getting errors in that update we 
> have in the Spark code, but we are not experiencing issues when working from 
> the impala shell.
> After all that we have tried, we are pretty certain that this is a bug in 
> Kudu, although we think it is a bit strange that it is undocumented and 
> certainly it's hard to reproduce.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to