[
https://issues.apache.org/jira/browse/IMPALA-10481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17286722#comment-17286722
]
Grant Henke commented on IMPALA-10481:
--------------------------------------
[~drorke]That would be awesome. Please do!
> Lack of TServer affinity in remote Kudu scans results in bad OS buffer cache
> behavior on tablet servers
> -------------------------------------------------------------------------------------------------------
>
> Key: IMPALA-10481
> URL: https://issues.apache.org/jira/browse/IMPALA-10481
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Affects Versions: Impala 4.0
> Reporter: David Rorke
> Priority: Major
> Labels: kudu, performance
>
> Remote Kudu scans can take many iterations against the same scan range before
> achieving good performance if the OS buffer cache is initially cold on the
> tablet servers. The slow warmup of the buffer cache is exacerbated by the
> fact that remote scans in the default Impala config choose a tablet server at
> random from the replica candidates. The Kudu client supports a LEADER_ONLY
> option that provides hard affinity to the leader replica, and Impala allows
> this to be configured using the --pick_only_leaders_for_tests option, but
> this is currently considered a testing only option and by default Impala will
> connect to a random replica.
> The following is a series of iterations of TPC-DS query 33 (times in
> seconds), against a freshly started Kudu cluster, in 3 configurations (1)
> local reads, with Impala running on Kudu cluster, (2) remote reads from
> separate Impala cluster with default config, (3) remote reads with
> pick_only_leaders_for_tests=true (LEADER_ONLY affinity)
>
> ||Config||Iteration 1||Iter 2||Iter 3||Iter 4||Iter 5||Iter 6||Iter 7||Iter
> 8||Iter 9||
> |Local|111.4|14.6| | | | | | | |
> |Remote (default config)|110.8|56.9|49.9|43.3|37.3|44.0|20.0|28.9|14.9|
> |Remote (LEADER_ONLY)|120.1|16.2|15.7|14.2| | | | | |
> With pick_only_leaders_for_tests, the remote performance improves quickly,
> approaching local performance on the second iteration and warming up fully by
> iteration 4. In the default config it takes 9 iterations of the query
> before we see the same performance.
> Running similar experiments after explicitly dropping the buffer cache on the
> tablet servers confirmed that this slow warmup is caused by poor buffer cache
> hit rates until the cache is fully warm.
> I suspect that slow warmup isn't the only consequence of this. Caching a
> given tablet in the buffer cache on multiple tablet servers increases the
> overall buffer cache footprint and will increase tserver memory pressure
> under load.
> We should consider setting the LEADER_ONLY option by default for remote Kudu
> reads. The only concern would be that this might result in worse load
> balancing and hotspots, in which case Kudu might need to implement some
> additional connection option that provides a better mix of affinity and load
> balancing.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]