[ 
https://issues.apache.org/jira/browse/IMPALA-10481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Rorke resolved IMPALA-10481.
----------------------------------
    Resolution: Fixed

> Lack of TServer affinity in remote Kudu scans results in bad OS buffer cache 
> behavior on tablet servers
> -------------------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-10481
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10481
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 4.0.0
>            Reporter: David Rorke
>            Priority: Major
>              Labels: kudu, performance
>
> Remote Kudu scans can take many iterations against the same scan range before 
> achieving good performance if the OS buffer cache is initially cold on the 
> tablet servers.  The slow warmup of the buffer cache is exacerbated by the 
> fact that remote scans in the default Impala config choose a tablet server at 
> random from the replica candidates.  The Kudu client supports a LEADER_ONLY 
> option that provides hard affinity to the leader replica, and Impala allows 
> this to be configured using the --pick_only_leaders_for_tests option, but 
> this is currently considered a testing only option and by default Impala will 
> connect to a random replica.
> The following is a series of iterations of TPC-DS query 33 (times in 
> seconds), against a freshly started Kudu cluster, in 3 configurations (1) 
> local reads, with Impala running on Kudu cluster, (2) remote reads from 
> separate Impala cluster with default config, (3) remote reads with 
> pick_only_leaders_for_tests=true (LEADER_ONLY affinity)
>  
> ||Config||Iteration 1||Iter 2||Iter 3||Iter 4||Iter 5||Iter 6||Iter 7||Iter 
> 8||Iter 9||
> |Local|111.4|14.6| | | | | | | |
> |Remote (default config)|110.8|56.9|49.9|43.3|37.3|44.0|20.0|28.9|14.9|
> |Remote (LEADER_ONLY)|120.1|16.2|15.7|14.2| | | | | |
> With pick_only_leaders_for_tests, the remote performance improves quickly, 
> approaching local performance on the second iteration and warming up fully by 
> iteration 4.   In the default config it takes 9 iterations of the query 
> before we see the same performance.
> Running similar experiments after explicitly dropping the buffer cache on the 
> tablet servers confirmed that this slow warmup is caused by poor buffer cache 
> hit rates until the cache is fully warm.
> I suspect that slow warmup isn't the only consequence of this.  Caching a 
> given tablet in the buffer cache on multiple tablet servers increases the 
> overall buffer cache footprint and will increase tserver memory pressure 
> under load.
> We should consider setting the LEADER_ONLY option by default for remote Kudu 
> reads.  The only concern would be that this might result in worse load 
> balancing and hotspots, in which case Kudu might need to implement some 
> additional connection option that provides a better mix of affinity and load 
> balancing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to