[ https://issues.apache.org/jira/browse/IMPALA-10481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
David Rorke resolved IMPALA-10481. ---------------------------------- Resolution: Fixed > Lack of TServer affinity in remote Kudu scans results in bad OS buffer cache > behavior on tablet servers > ------------------------------------------------------------------------------------------------------- > > Key: IMPALA-10481 > URL: https://issues.apache.org/jira/browse/IMPALA-10481 > Project: IMPALA > Issue Type: Bug > Components: Backend > Affects Versions: Impala 4.0.0 > Reporter: David Rorke > Priority: Major > Labels: kudu, performance > > Remote Kudu scans can take many iterations against the same scan range before > achieving good performance if the OS buffer cache is initially cold on the > tablet servers. The slow warmup of the buffer cache is exacerbated by the > fact that remote scans in the default Impala config choose a tablet server at > random from the replica candidates. The Kudu client supports a LEADER_ONLY > option that provides hard affinity to the leader replica, and Impala allows > this to be configured using the --pick_only_leaders_for_tests option, but > this is currently considered a testing only option and by default Impala will > connect to a random replica. > The following is a series of iterations of TPC-DS query 33 (times in > seconds), against a freshly started Kudu cluster, in 3 configurations (1) > local reads, with Impala running on Kudu cluster, (2) remote reads from > separate Impala cluster with default config, (3) remote reads with > pick_only_leaders_for_tests=true (LEADER_ONLY affinity) > > ||Config||Iteration 1||Iter 2||Iter 3||Iter 4||Iter 5||Iter 6||Iter 7||Iter > 8||Iter 9|| > |Local|111.4|14.6| | | | | | | | > |Remote (default config)|110.8|56.9|49.9|43.3|37.3|44.0|20.0|28.9|14.9| > |Remote (LEADER_ONLY)|120.1|16.2|15.7|14.2| | | | | | > With pick_only_leaders_for_tests, the remote performance improves quickly, > approaching local performance on the second iteration and warming up fully by > iteration 4. In the default config it takes 9 iterations of the query > before we see the same performance. > Running similar experiments after explicitly dropping the buffer cache on the > tablet servers confirmed that this slow warmup is caused by poor buffer cache > hit rates until the cache is fully warm. > I suspect that slow warmup isn't the only consequence of this. Caching a > given tablet in the buffer cache on multiple tablet servers increases the > overall buffer cache footprint and will increase tserver memory pressure > under load. > We should consider setting the LEADER_ONLY option by default for remote Kudu > reads. The only concern would be that this might result in worse load > balancing and hotspots, in which case Kudu might need to implement some > additional connection option that provides a better mix of affinity and load > balancing. -- This message was sent by Atlassian Jira (v8.3.4#803005)