[
https://issues.apache.org/jira/browse/KUDU-3500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexey Serbin updated KUDU-3500:
--------------------------------
Code Review: https://gerrit.cloudera.org/#/c/20300/
> Don't start write operations timed out in the tablet's prepare queue
> --------------------------------------------------------------------
>
> Key: KUDU-3500
> URL: https://issues.apache.org/jira/browse/KUDU-3500
> Project: Kudu
> Issue Type: Improvement
> Components: tserver
> Reporter: Alexey Serbin
> Assignee: Alexey Serbin
> Priority: Major
>
> While troubleshooting one performance issue where the prepare queue of a
> tablet was very long, I noticed that tablet servers start write operations
> that correspond to RPCs that have already timed out. Most likely, the client
> that sent the RPC had already detected the timeout and expected that the
> write would have failed already, so there isn't much sense to start such
> operations anyway.
> As a simple optimization, tablet servers shouldn't even start the PREPARE
> phase for such operations, but respond with TimedOut error status right away
> when dispatched them to the prepare thread. Doing so would help with
> clearing the prepare queue and processing not-yet-timed-out requests from the
> queue faster, increasing the overall robustness of a tablet server when the
> load is high and the node's CPU and disk IO bandwidth are saturated.
> A new metric should be introduced to track the number of WriteRequestPB RPCs
> timed out in the prepare queue and responded with TimedOut error status
> before starting the PREPARE phase for the corresponding operations.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)