[ 
https://issues.apache.org/jira/browse/KUDU-3500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Serbin reassigned KUDU-3500:
-----------------------------------

    Assignee: Alexey Serbin

> Don't start write operations timed out in the tablet's prepare queue
> --------------------------------------------------------------------
>
>                 Key: KUDU-3500
>                 URL: https://issues.apache.org/jira/browse/KUDU-3500
>             Project: Kudu
>          Issue Type: Improvement
>          Components: tserver
>            Reporter: Alexey Serbin
>            Assignee: Alexey Serbin
>            Priority: Major
>
> While troubleshooting one performance issue where the prepare queue of a 
> tablet was very long, I noticed that tablet servers start write operations 
> that corresponds to RPCs that have already timed out.  Most likely, the 
> client that sent the RPC has already detected the timeout and expects that 
> the write had failed already, so there isn't much sense to start such 
> operations anyway.
> As a simple optimization, tablet servers shouldn't even start the PREPARE 
> phase for such operations, but respond with TimedOut error status right away 
> when such an operation is dispatched to the prepare thread.  Doing so would 
> help with clearing the queue and processing not-yet-timed-out requests from 
> the queue faster, increasing the overall robustness of a tablet server when 
> the load is high and the node's CPU and disk IO bandwidth are saturated.
> A new metric should be introduced to track the number of WriteRequestPB RPCs 
> timed out in the prepare queue and responded with TimedOut error status 
> before starting the PREPARE phase for the corresponding operations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to