[ https://issues.apache.org/jira/browse/KUDU-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Todd Lipcon resolved KUDU-2329. ------------------------------- Resolution: Incomplete Fix Version/s: n/a > Random RPC timeout errors when inserting rows in a Kudu table > ------------------------------------------------------------- > > Key: KUDU-2329 > URL: https://issues.apache.org/jira/browse/KUDU-2329 > Project: Kudu > Issue Type: Bug > Components: rpc, server > Affects Versions: 1.5.0 > Reporter: Héctor Gutiérrez > Priority: Major > Fix For: n/a > > > When executing inserts into a Kudu table, we are experiencing errors at > random times. The first time we found one of these errors was during a bulk > update of a Kudu table via Spark (in Scala): > {{kuduContext.updateRows(dataFrame, "table_name")}} > The error message in Spark was the following: > {{java.lang.RuntimeException: failed to write 579 rows from DataFrame to > Kudu; sample errors: Timed out: can not complete before timeout: Batch > {operations=6, tablet="cd1e33fce0114c9bbd9c14e2559e7591" [0x0000000F, > 0x00000010), ignoreAllDuplicateRows=false, rpc=KuduRpc(method=Write, > tablet=cd1e33fce0114c9bbd9c14e2559e7591, attempt=3, > DeadlineTracker(timeout=30000, elapsed=30090), Traces: [0ms] sending RPC to > server 6f273933b4d5498e87aadfb99b054a21, [10011ms] received from server > 6f273933b4d5498e87aadfb99b054a21 response Network error: [peer > 6f273933b4d5498e87aadfb99b054a21] encountered a read timeout; closing the > channel, [10011ms] delaying RPC due to Network error: [peer > 6f273933b4d5498e87aadfb99b054a21] encountered a read timeout; closing the > channel, [10033ms] sending RPC to server 6f273933b4d5498e87aadfb99b054a21, > [20050ms] received from server 6f273933b4d5498e87aadfb99b054a21 response > Network error: [peer 6f273933b4d5498e87aadfb99b054a21] encountered a read > timeout; closing the channel, [20050ms] delaying RPC due to Network error: > [peer 6f273933b4d5498e87aadfb99b054a21] encountered a read timeout; closing > the channel, [20072ms] sending RPC to server > 6f273933b4d5498e87aadfb99b054a21, [30090ms] received from server > 6f273933b4d5498e87aadfb99b054a21 response Network error: [peer > 6f273933b4d5498e87aadfb99b054a21] encountered a read timeout; closing the > channel, [30090ms] delaying RPC due to Network error: [peer > 6f273933b4d5498e87aadfb99b054a21] encountered a read timeout; closing the > channel)} > }} > (+ 4 more errors similar to this one in the error message) > We first thought it was actually a problem with our Spark code, but when we > tried to execute a simple "INSERT INTO" query from the impala shell into a > Kudu table, we got the following error: > {{[.............................] > insert into test_kudu values (282, > 'hola');}} > {{ \{{ Query: insert into test_kudu values (282, 'hola')}}}} > {{ \{{ Query submitted at: ......................}}}} > {{ \{{ Query progress can be monitored at: ........................}}}} > {{ \{{ WARNINGS: Kudu error(s) reported, first error: Timed out: Failed to > write batch of 1 ops to tablet 9c295e90811e483a9550bfd75abcf666 after 1 > attempt(s): Failed to write to server: 071bcafbb1644678a697c474662047b7 > (.........................:7050): Write RPC to ....................:7050 > timed out after 179.949s (SENT)}}}} > {{Error in Kudu table 'impala:kudu_db.test_kudu': Timed out: Failed to write > batch of 1 ops to tablet 9c295e90811e483a9550bfd75abcf666 after 1 attempt(s): > Failed to write to server: 071bcafbb1644678a697c474662047b7 > (...........................:7050): Write RPC to ......................:7050 > timed out after 179.949s (SENT)}} > To make things even more confusing, despite getting this error in the impala > shell, after a while (and not immediately), the inserted rows ended up in the > table, so somehow they were actually inserted. > We also tried tweaking the Kudu timeout configuration values that we had > previously set, but it didn't solve anything and the problem kept appearing. > Furthermore, we don't always get these errors, they only appear at random > times. For example, right now we're just getting errors in that update we > have in the Spark code, but we are not experiencing issues when working from > the impala shell. > After all that we have tried, we are pretty certain that this is a bug in > Kudu, although we think it is a bit strange that it is undocumented and > certainly it's hard to reproduce. -- This message was sent by Atlassian JIRA (v7.6.3#76005)