[ 
https://issues.apache.org/jira/browse/KUDU-3500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Serbin updated KUDU-3500:
--------------------------------
    Code Review: https://gerrit.cloudera.org/#/c/20300/

> Don't start write operations timed out in the tablet's prepare queue
> --------------------------------------------------------------------
>
>                 Key: KUDU-3500
>                 URL: https://issues.apache.org/jira/browse/KUDU-3500
>             Project: Kudu
>          Issue Type: Improvement
>          Components: tserver
>            Reporter: Alexey Serbin
>            Assignee: Alexey Serbin
>            Priority: Major
>
> While troubleshooting one performance issue where the prepare queue of a 
> tablet was very long, I noticed that tablet servers start write operations 
> that correspond to RPCs that have already timed out.  Most likely, the client 
> that sent the RPC had already detected the timeout and expected that the 
> write would have failed already, so there isn't much sense to start such 
> operations anyway.
> As a simple optimization, tablet servers shouldn't even start the PREPARE 
> phase for such operations, but respond with TimedOut error status right away 
> when dispatched them to the prepare thread.  Doing so would help with 
> clearing the prepare queue and processing not-yet-timed-out requests from the 
> queue faster, increasing the overall robustness of a tablet server when the 
> load is high and the node's CPU and disk IO bandwidth are saturated.
> A new metric should be introduced to track the number of WriteRequestPB RPCs 
> timed out in the prepare queue and responded with TimedOut error status 
> before starting the PREPARE phase for the corresponding operations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to