Repository: incubator-impala
Updated Branches:
  refs/heads/master ca2baa5d4 -> 2544f2394


IMPALA-3203: Part 1: Free list implementation

We will have a single free list per size class. Free buffers
are stored in a heap, ordered by the memory address of the
buffer, to help reduce address space fragmentation.

A follow-up patch will use the free lists in BufferPool.

Currently TCMalloc has thread-local caches with somewhat similar
purpose. However, these are not suitable for our purposes for
several reasons:
* They are designed for caching small allocations - large allocations
  like most buffers are served from a global page heap protected
  by a global lock.
* We intend to move away from using TCMalloc for buffers: IMPALA-5073
* Thread caches are ineffective for the producer-consumer pattern where
  one thread allocates memory and another thread frees it.
* TCMalloc gives us limited control over how and when memory is
  actually released to the OS.

Testing:
Added unit tests for sanity checks and verification of behaviour
that is trickier to check in integration or system tests.
The cost will be exercised more thoroughly via BufferPool
in Part 2.

Performance:
Includes a benchmark that demonstrates the scalability of
the free lists under concurrency. When measuring pure throughput
of free list operations, having a free list per core is
significantly faster than a shared free list, or allocating
directly from TCMalloc. On 8 cores, if the memory allocated is
actually touched then for 64KB+ buffers, memory access is the
bottleneck rather than lock contention.

The benchmark also showed that non-inlined constructors and move
operators of BufferHandle were taking significant CPU cycles, so
I inlined those.

This suggests that having a free list per core is more than sufficient
(however, we will need to validate this with system concurrency
benchmarks once we switch to using this during query execution).

Change-Id: Ia89acfa4efdecb96d3678443b4748932b4133b9b
Reviewed-on: http://gerrit.cloudera.org:8080/6410
Reviewed-by: Tim Armstrong <[email protected]>
Tested-by: Impala Public Jenkins


Project: http://git-wip-us.apache.org/repos/asf/incubator-impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-impala/commit/5ecb9759
Tree: http://git-wip-us.apache.org/repos/asf/incubator-impala/tree/5ecb9759
Diff: http://git-wip-us.apache.org/repos/asf/incubator-impala/diff/5ecb9759

Branch: refs/heads/master
Commit: 5ecb97598ac0b33d99fdb84a624727952a5b1623
Parents: ca2baa5
Author: Tim Armstrong <[email protected]>
Authored: Tue Sep 20 00:47:06 2016 -0700
Committer: Impala Public Jenkins <[email protected]>
Committed: Tue Mar 21 01:52:45 2017 +0000

----------------------------------------------------------------------
 be/src/benchmarks/CMakeLists.txt              |   1 +
 be/src/benchmarks/free-lists-benchmark.cc     | 453 +++++++++++++++++++++
 be/src/runtime/bufferpool/CMakeLists.txt      |   1 +
 be/src/runtime/bufferpool/buffer-allocator.cc |  12 +-
 be/src/runtime/bufferpool/buffer-allocator.h  |   6 +-
 be/src/runtime/bufferpool/buffer-pool.cc      |  47 +--
 be/src/runtime/bufferpool/buffer-pool.h       |  41 +-
 be/src/runtime/bufferpool/free-list-test.cc   | 166 ++++++++
 be/src/runtime/bufferpool/free-list.h         | 127 ++++++
 be/src/util/benchmark-test.cc                 |   2 +-
 be/src/util/benchmark.cc                      |   7 +-
 be/src/util/benchmark.h                       |  14 +-
 12 files changed, 813 insertions(+), 64 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/5ecb9759/be/src/benchmarks/CMakeLists.txt
----------------------------------------------------------------------
diff --git a/be/src/benchmarks/CMakeLists.txt b/be/src/benchmarks/CMakeLists.txt
index 5d3dfbd..e954583 100644
--- a/be/src/benchmarks/CMakeLists.txt
+++ b/be/src/benchmarks/CMakeLists.txt
@@ -39,6 +39,7 @@ ADD_BE_BENCHMARK(bit-packing-benchmark)
 ADD_BE_BENCHMARK(bloom-filter-benchmark)
 ADD_BE_BENCHMARK(bswap-benchmark)
 ADD_BE_BENCHMARK(expr-benchmark)
+ADD_BE_BENCHMARK(free-lists-benchmark)
 ADD_BE_BENCHMARK(hash-benchmark)
 ADD_BE_BENCHMARK(in-predicate-benchmark)
 ADD_BE_BENCHMARK(int-hash-benchmark)

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/5ecb9759/be/src/benchmarks/free-lists-benchmark.cc
----------------------------------------------------------------------
diff --git a/be/src/benchmarks/free-lists-benchmark.cc 
b/be/src/benchmarks/free-lists-benchmark.cc
new file mode 100644
index 0000000..43caeb6
--- /dev/null
+++ b/be/src/benchmarks/free-lists-benchmark.cc
@@ -0,0 +1,453 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include <math.h>
+#include <random>
+#include <stdio.h>
+#include <stdlib.h>
+#include <iostream>
+#include <utility>
+#include <vector>
+
+#include <boost/thread.hpp>
+
+#include "common/object-pool.h"
+#include "gutil/strings/substitute.h"
+#include "runtime/bufferpool/buffer-allocator.h"
+#include "runtime/bufferpool/free-list.h"
+#include "util/aligned-new.h"
+#include "util/benchmark.h"
+#include "util/cpu-info.h"
+#include "util/mem-range.h"
+#include "util/stopwatch.h"
+
+#include "common/names.h"
+
+using namespace impala;
+using boost::thread;
+using boost::thread_group;
+using std::mt19937_64;
+using std::uniform_int_distribution;
+using strings::Substitute;
+
+// This benchmark tests free list performance under concurrency by measuring 
the
+// throughput of a batch of Allocate()/Free() operations that use the free list
+// to cache buffers. The benchmark measures total throughput across all 
threads.
+// The baseline is the throughput of a single thread that is not competing for 
any
+// resources.
+//
+// These Allocate() and Free() calls are interleaved with work that touches the
+// allocated memory, so that the benchmark exhibits a realistic level of lock
+// contention.
+//
+// The buffer size is varied from 64KB to 256KB (at 256KB, throughput is 
already limited
+// by memory bandwidth instead of the free lists so there is little value in 
collecting
+// more data points) and the number of iterations ("iters") over the allocated 
memory is
+// varied.
+//
+// Summary of results:
+// -------------------
+// In the 0 iters case, which measures pure throughput of free list operations,
+// ListPerThread has much higher throughput than SharedList because there is no
+// contention over locks. NoList measures performance of the underlying system
+// allocator (TCMalloc). TCMalloc performs poorly on large buffers compared to
+// ListPerThread because it must frequently access the shared global heap and
+// contend for a single global lock to serve large allocations.
+//
+// Performance of all three free list approaches (SharedList, ListPerThread, 
NoList)
+// converges for larger buffer sizes and # of iterations. Performance per core 
also
+// drops off as core count increases, regardless of whether the threads are 
contending
+// over locks. This suggests that typical workloads that do some non-trivial 
processing
+// over allocated buffers >= 64KB will usually be bottlenecked by memory 
access instead
+// of the allocator. However, the balance may shift on larger machines with 
more cores
+// as lock contention over a global mutex would become very expensive. Thus, 
sharing a
+// free list between a small number of cores may work reasonably in practice, 
but sharing
+// one between all cores will likely not scale at some point.
+//
+/*
+Machine Info: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
+
+FreeLists 0 iters on 64 kb:Function  iters/ms   10%ile   50%ile   90%ile     
10%ile     50%ile     90%ile
+                                                                         
(relative) (relative) (relative)
+---------------------------------------------------------------------------------------------------------
+               1 threads SharedList               24.6       25     25.3       
  1X         1X         1X
+               2 threads SharedList               9.05     9.53     10.6     
0.368X     0.381X      0.42X
+               4 threads SharedList               7.96     8.18     8.41     
0.324X     0.327X     0.332X
+               8 threads SharedList               6.37      6.5     6.72     
0.259X      0.26X     0.265X
+            1 threads ListPerThread               24.4       25     25.3     
0.993X     0.998X     0.999X
+            2 threads ListPerThread               33.9     35.6     36.6      
1.38X      1.42X      1.45X
+            4 threads ListPerThread               40.5     45.9     50.4      
1.65X      1.83X      1.99X
+            8 threads ListPerThread               38.1     41.9     44.5      
1.55X      1.67X      1.76X
+                   1 threads NoList               33.1     34.5     35.1      
1.35X      1.38X      1.38X
+                   2 threads NoList               42.5     43.9     45.2      
1.73X      1.75X      1.79X
+                   4 threads NoList               34.7     36.8     39.6      
1.41X      1.47X      1.56X
+                   8 threads NoList               25.3     27.6     29.6      
1.03X       1.1X      1.17X
+
+FreeLists 0 iters on 128 kb:Function  iters/ms   10%ile   50%ile   90%ile     
10%ile     50%ile     90%ile
+                                                                         
(relative) (relative) (relative)
+---------------------------------------------------------------------------------------------------------
+               1 threads SharedList               19.9     20.2     20.7       
  1X         1X         1X
+               2 threads SharedList               8.85     9.35     10.3     
0.445X     0.463X     0.496X
+               4 threads SharedList               7.72     7.93     8.09     
0.388X     0.393X      0.39X
+               8 threads SharedList               6.21     6.38     6.54     
0.312X     0.316X     0.315X
+            1 threads ListPerThread               19.8     20.2     20.7     
0.997X         1X         1X
+            2 threads ListPerThread                 30     31.3     32.2      
1.51X      1.55X      1.55X
+            4 threads ListPerThread               34.9     36.9     40.6      
1.75X      1.83X      1.96X
+            8 threads ListPerThread               30.6     32.1     33.1      
1.54X      1.59X       1.6X
+                   1 threads NoList               20.6     21.8     22.3      
1.04X      1.08X      1.08X
+                   2 threads NoList               29.3     30.6       32      
1.47X      1.51X      1.54X
+                   4 threads NoList               22.2     23.1     23.8      
1.11X      1.14X      1.15X
+                   8 threads NoList                 14     14.3       15     
0.704X      0.71X     0.722X
+
+FreeLists 0 iters on 256 kb:Function  iters/ms   10%ile   50%ile   90%ile     
10%ile     50%ile     90%ile
+                                                                         
(relative) (relative) (relative)
+---------------------------------------------------------------------------------------------------------
+               1 threads SharedList               15.5     16.1     16.4       
  1X         1X         1X
+               2 threads SharedList               8.46     8.92     9.46     
0.545X     0.555X     0.578X
+               4 threads SharedList               7.31     7.48     7.69     
0.471X     0.465X      0.47X
+               8 threads SharedList               5.91     6.04     6.21     
0.381X     0.376X     0.379X
+            1 threads ListPerThread               15.5       16     16.5     
0.999X     0.995X      1.01X
+            2 threads ListPerThread               24.9       26     27.1       
1.6X      1.62X      1.66X
+            4 threads ListPerThread               24.8     26.9     29.6       
1.6X      1.67X      1.81X
+            8 threads ListPerThread               18.4     21.9     23.1      
1.18X      1.36X      1.41X
+                   1 threads NoList               14.6     15.4     15.8     
0.939X     0.956X     0.967X
+                   2 threads NoList               17.5     18.8     19.2      
1.12X      1.17X      1.17X
+                   4 threads NoList               11.7     12.2     12.6     
0.756X     0.757X     0.773X
+                   8 threads NoList               6.42     6.61     6.82     
0.414X     0.411X     0.417X
+
+FreeLists 1 iters on 64 kb:Function  iters/ms   10%ile   50%ile   90%ile     
10%ile     50%ile     90%ile
+                                                                         
(relative) (relative) (relative)
+---------------------------------------------------------------------------------------------------------
+               1 threads SharedList               1.38     1.39      1.4       
  1X         1X         1X
+               2 threads SharedList               1.77     1.81     1.84      
1.28X       1.3X      1.31X
+               4 threads SharedList               1.86     1.96     2.03      
1.35X      1.41X      1.45X
+               8 threads SharedList               1.55     1.68     1.77      
1.13X      1.21X      1.26X
+            1 threads ListPerThread               1.38     1.39      1.4       
  1X         1X         1X
+            2 threads ListPerThread               2.05     2.13     2.19      
1.49X      1.53X      1.56X
+            4 threads ListPerThread               2.07      2.2     2.32      
1.51X      1.59X      1.66X
+            8 threads ListPerThread               1.52     1.63     1.71      
1.11X      1.17X      1.22X
+                   1 threads NoList               1.04     1.05     1.05     
0.754X     0.754X     0.748X
+                   2 threads NoList               1.57     1.58     1.61      
1.14X      1.14X      1.15X
+                   4 threads NoList                1.9     2.09     2.17      
1.38X       1.5X      1.55X
+                   8 threads NoList               1.49      1.6     1.73      
1.08X      1.16X      1.24X
+
+FreeLists 1 iters on 128 kb:Function  iters/ms   10%ile   50%ile   90%ile     
10%ile     50%ile     90%ile
+                                                                         
(relative) (relative) (relative)
+---------------------------------------------------------------------------------------------------------
+               1 threads SharedList              0.557    0.561    0.561       
  1X         1X         1X
+               2 threads SharedList               0.75    0.765    0.772      
1.35X      1.36X      1.38X
+               4 threads SharedList              0.759    0.793    0.833      
1.36X      1.41X      1.48X
+               8 threads SharedList              0.661    0.685    0.714      
1.19X      1.22X      1.27X
+            1 threads ListPerThread              0.557    0.561    0.561       
  1X         1X         1X
+            2 threads ListPerThread              0.817    0.833     0.85      
1.47X      1.48X      1.51X
+            4 threads ListPerThread              0.802    0.842    0.885      
1.44X       1.5X      1.58X
+            8 threads ListPerThread              0.621    0.667    0.697      
1.12X      1.19X      1.24X
+                   1 threads NoList                0.4    0.404    0.404     
0.719X     0.719X     0.719X
+                   2 threads NoList              0.596    0.607    0.621      
1.07X      1.08X      1.11X
+                   4 threads NoList              0.724    0.773     0.81       
1.3X      1.38X      1.44X
+                   8 threads NoList              0.561    0.619     0.65      
1.01X       1.1X      1.16X
+
+FreeLists 1 iters on 256 kb:Function  iters/ms   10%ile   50%ile   90%ile     
10%ile     50%ile     90%ile
+                                                                         
(relative) (relative) (relative)
+---------------------------------------------------------------------------------------------------------
+               1 threads SharedList               0.25    0.252    0.255       
  1X         1X         1X
+               2 threads SharedList              0.355    0.365    0.369      
1.42X      1.45X      1.45X
+               4 threads SharedList              0.297    0.306    0.318      
1.19X      1.21X      1.25X
+               8 threads SharedList              0.315    0.324    0.337      
1.26X      1.28X      1.32X
+            1 threads ListPerThread              0.252    0.255    0.255      
1.01X      1.01X         1X
+            2 threads ListPerThread              0.355    0.358    0.381      
1.42X      1.42X      1.49X
+            4 threads ListPerThread              0.296    0.309    0.345      
1.18X      1.22X      1.35X
+            8 threads ListPerThread              0.291    0.297    0.304      
1.17X      1.18X      1.19X
+                   1 threads NoList              0.143    0.143    0.143     
0.571X     0.566X      0.56X
+                   2 threads NoList              0.224    0.226    0.229     
0.897X     0.897X     0.897X
+                   4 threads NoList              0.275    0.283    0.294       
1.1X      1.12X      1.15X
+                   8 threads NoList              0.278    0.288    0.297      
1.11X      1.14X      1.17X
+
+FreeLists 2 iters on 64 kb:Function  iters/ms   10%ile   50%ile   90%ile     
10%ile     50%ile     90%ile
+                                                                         
(relative) (relative) (relative)
+---------------------------------------------------------------------------------------------------------
+               1 threads SharedList              0.772    0.779    0.779       
  1X         1X         1X
+               2 threads SharedList               1.17     1.18     1.19      
1.51X      1.51X      1.53X
+               4 threads SharedList               1.36      1.5     1.65      
1.77X      1.92X      2.12X
+               8 threads SharedList                1.4     1.46     1.51      
1.81X      1.87X      1.94X
+            1 threads ListPerThread              0.772    0.779    0.779       
  1X         1X         1X
+            2 threads ListPerThread               1.27     1.29      1.3      
1.65X      1.65X      1.67X
+            4 threads ListPerThread                1.6     1.73     1.87      
2.08X      2.22X       2.4X
+            8 threads ListPerThread               1.48     1.54     1.59      
1.92X      1.98X      2.04X
+                   1 threads NoList              0.644     0.65     0.65     
0.834X     0.835X     0.835X
+                   2 threads NoList               1.09      1.1     1.11      
1.41X      1.41X      1.43X
+                   4 threads NoList               1.45     1.63     1.82      
1.88X       2.1X      2.34X
+                   8 threads NoList               1.43      1.5     1.54      
1.85X      1.92X      1.97X
+
+FreeLists 2 iters on 128 kb:Function  iters/ms   10%ile   50%ile   90%ile     
10%ile     50%ile     90%ile
+                                                                         
(relative) (relative) (relative)
+---------------------------------------------------------------------------------------------------------
+               1 threads SharedList              0.365    0.365    0.369       
  1X         1X         1X
+               2 threads SharedList              0.587    0.593    0.604      
1.61X      1.62X      1.64X
+               4 threads SharedList              0.636    0.667    0.701      
1.74X      1.82X       1.9X
+               8 threads SharedList              0.582    0.644    0.655      
1.59X      1.76X      1.78X
+            1 threads ListPerThread              0.365    0.369    0.369       
  1X      1.01X         1X
+            2 threads ListPerThread              0.586    0.591    0.621       
1.6X      1.62X      1.68X
+            4 threads ListPerThread               0.64    0.714    0.765      
1.75X      1.95X      2.07X
+            8 threads ListPerThread              0.571    0.613     0.65      
1.56X      1.68X      1.76X
+                   1 threads NoList              0.294    0.297    0.297     
0.805X     0.813X     0.805X
+                   2 threads NoList              0.455     0.46    0.468      
1.25X      1.26X      1.27X
+                   4 threads NoList               0.55     0.62     0.67      
1.51X       1.7X      1.82X
+                   8 threads NoList              0.519    0.539    0.566      
1.42X      1.48X      1.54X
+
+FreeLists 2 iters on 256 kb:Function  iters/ms   10%ile   50%ile   90%ile     
10%ile     50%ile     90%ile
+                                                                         
(relative) (relative) (relative)
+---------------------------------------------------------------------------------------------------------
+               1 threads SharedList               0.15     0.15    0.151       
  1X         1X         1X
+               2 threads SharedList              0.226    0.233    0.235      
1.51X      1.56X      1.56X
+               4 threads SharedList              0.263    0.275    0.286      
1.76X      1.84X      1.89X
+               8 threads SharedList               0.26    0.288    0.306      
1.74X      1.93X      2.03X
+            1 threads ListPerThread               0.15     0.15    0.151       
  1X         1X         1X
+            2 threads ListPerThread              0.245    0.248     0.27      
1.64X      1.66X      1.79X
+            4 threads ListPerThread              0.238    0.263    0.278      
1.59X      1.76X      1.84X
+            8 threads ListPerThread               0.22    0.231     0.24      
1.47X      1.54X      1.59X
+                   1 threads NoList              0.115    0.117    0.117     
0.772X     0.779X     0.772X
+                   2 threads NoList              0.182    0.185    0.187      
1.22X      1.24X      1.24X
+                   4 threads NoList              0.226    0.243    0.255      
1.51X      1.63X      1.69X
+                   8 threads NoList              0.238     0.25    0.262      
1.59X      1.67X      1.73X
+
+FreeLists 4 iters on 64 kb:Function  iters/ms   10%ile   50%ile   90%ile     
10%ile     50%ile     90%ile
+                                                                         
(relative) (relative) (relative)
+---------------------------------------------------------------------------------------------------------
+               1 threads SharedList              0.414    0.414    0.418       
  1X         1X         1X
+               2 threads SharedList              0.714    0.721    0.728      
1.72X      1.74X      1.74X
+               4 threads SharedList              0.883    0.965      1.1      
2.13X      2.33X      2.63X
+               8 threads SharedList               1.09     1.11     1.14      
2.62X      2.68X      2.73X
+            1 threads ListPerThread              0.411    0.414    0.418     
0.991X         1X         1X
+            2 threads ListPerThread               0.75    0.759    0.765      
1.81X      1.83X      1.83X
+            4 threads ListPerThread              0.953     1.08     1.25       
2.3X      2.61X      2.99X
+            8 threads ListPerThread               1.03     1.09     1.12      
2.48X      2.62X      2.68X
+                   1 threads NoList              0.381    0.385    0.426     
0.919X     0.928X      1.02X
+                   2 threads NoList              0.679    0.685    0.694      
1.64X      1.65X      1.66X
+                   4 threads NoList              0.924     1.05     1.18      
2.23X      2.52X      2.81X
+                   8 threads NoList               1.06     1.11     1.15      
2.55X      2.68X      2.75X
+
+FreeLists 4 iters on 128 kb:Function  iters/ms   10%ile   50%ile   90%ile     
10%ile     50%ile     90%ile
+                                                                         
(relative) (relative) (relative)
+---------------------------------------------------------------------------------------------------------
+               1 threads SharedList              0.214    0.216    0.216       
  1X         1X         1X
+               2 threads SharedList              0.352    0.355    0.355      
1.65X      1.65X      1.65X
+               4 threads SharedList              0.362     0.46    0.519      
1.69X      2.13X      2.41X
+               8 threads SharedList              0.442    0.464    0.481      
2.07X      2.15X      2.23X
+            1 threads ListPerThread              0.214    0.216    0.216       
  1X         1X         1X
+            2 threads ListPerThread              0.352    0.355    0.358      
1.65X      1.65X      1.66X
+            4 threads ListPerThread              0.376      0.5    0.545      
1.76X      2.32X      2.52X
+            8 threads ListPerThread                0.4    0.426    0.447      
1.87X      1.97X      2.07X
+                   1 threads NoList              0.165    0.167    0.167     
0.773X     0.773X     0.773X
+                   2 threads NoList              0.301    0.304    0.306      
1.41X      1.41X      1.42X
+                   4 threads NoList              0.345    0.434    0.482      
1.62X      2.01X      2.24X
+                   8 threads NoList              0.404    0.434    0.451      
1.89X      2.01X      2.09X
+
+FreeLists 4 iters on 256 kb:Function  iters/ms   10%ile   50%ile   90%ile     
10%ile     50%ile     90%ile
+                                                                         
(relative) (relative) (relative)
+---------------------------------------------------------------------------------------------------------
+               1 threads SharedList              0.102    0.103    0.103       
  1X         1X         1X
+               2 threads SharedList              0.162    0.167    0.168      
1.59X      1.62X      1.64X
+               4 threads SharedList              0.202    0.222    0.238      
1.98X      2.16X      2.31X
+               8 threads SharedList              0.204    0.222     0.24       
  2X      2.16X      2.33X
+            1 threads ListPerThread              0.102    0.102    0.103       
  1X     0.991X         1X
+            2 threads ListPerThread              0.163    0.167    0.167       
1.6X      1.62X      1.62X
+            4 threads ListPerThread              0.188    0.204     0.22      
1.85X      1.98X      2.14X
+            8 threads ListPerThread              0.187    0.198      0.2      
1.84X      1.93X      1.95X
+                   1 threads NoList             0.0833   0.0833   0.0833     
0.818X     0.811X     0.811X
+                   2 threads NoList              0.137     0.14     0.14      
1.35X      1.36X      1.36X
+                   4 threads NoList              0.176     0.19    0.206      
1.73X      1.85X         2X
+                   8 threads NoList              0.176    0.208    0.218      
1.73X      2.02X      2.12X
+*/
+enum class FreeListMode {
+  SHARED_LIST, // One list shared between all threads.
+  LIST_PER_THREAD, // One list per thread.
+  NO_LIST, // No list - just allocate and free directly.
+};
+
+struct BenchmarkParams {
+  /// Number of concurrent threads.
+  int num_threads;
+
+  /// How to set up the free lists.
+  FreeListMode free_list_mode;
+
+  /// Number of iterations of work to do between freelist operations.
+  int work_iterations;
+
+  /// Size of allocation to make.
+  int64_t allocation_size;
+};
+
+/// Ensure that locks don't share cache lines.
+struct LockedList : public CacheLineAligned {
+  SpinLock lock;
+  FreeList list;
+};
+
+static const char* ModeString(FreeListMode mode) {
+  switch (mode) {
+    case FreeListMode::SHARED_LIST:
+      return "SharedList";
+    case FreeListMode::LIST_PER_THREAD:
+      return "ListPerThread";
+    case FreeListMode::NO_LIST:
+      return "NoList";
+    default:
+      return "Invalid";
+  }
+}
+
+static const int NUM_OPERATIONS_PER_BATCH = 512;
+// Choose a list size large enough so that most allocs should hit in the free 
list
+// but small enough that it will sometimes fill up.
+static const int MAX_LIST_ENTRIES = 64;
+
+static const int ALLOC_OP = 0;
+static const int FREE_OP = 1;
+
+static BufferAllocator allocator(64 * 1024);
+
+// Simulate doing some work with the buffer.
+void DoWork(uint8_t* data, int64_t len) {
+  // Touch all the data in the allocation. This is about the minimum amount of
+  // work we're likely to do with a buffer in a real workload.
+  memset(data, 1, len);
+}
+
+/// Make an allocation and do some work on it.
+void DoAlloc(const BenchmarkParams& params, LockedList* free_list,
+    vector<BufferPool::BufferHandle>* buffers) {
+  BufferPool::BufferHandle buffer;
+  bool got_buffer = false;
+  if (free_list != nullptr) {
+    lock_guard<SpinLock> l(free_list->lock);
+    got_buffer = free_list->list.PopFreeBuffer(&buffer);
+  }
+  if (!got_buffer) {
+    Status status = allocator.Allocate(params.allocation_size, &buffer);
+    if (!status.ok()) LOG(FATAL) << "Failed alloc " << status.msg().msg();
+  }
+  // Do some processing to simulate a vaguely realistic work pattern.
+  for (int j = 0; j < params.work_iterations; ++j) {
+    DoWork(buffer.data(), buffer.len());
+  }
+  buffers->emplace_back(move(buffer));
+}
+
+/// Free an allocation.
+void DoFree(const BenchmarkParams& params, LockedList* free_list,
+    vector<BufferPool::BufferHandle>* buffers) {
+  if (!buffers->empty()) {
+    if (free_list != nullptr) {
+      lock_guard<SpinLock> l(free_list->lock);
+      FreeList* list = &free_list->list;
+      list->AddFreeBuffer(move(buffers->back()));
+      if (list->Size() > MAX_LIST_ENTRIES) {
+        // Discard around 1/4 of the buffers to amortise the cost of sorting.
+        list->FreeBuffers(&allocator, list->Size() - MAX_LIST_ENTRIES * 3 / 4);
+      }
+    } else {
+      allocator.Free(move(buffers->back()));
+    }
+    buffers->pop_back();
+  }
+}
+
+/// Execute 'num_operations' on 'free_list' according to the parameters in
+/// 'params'.
+void FreeListBenchmarkThread(int thread_id, int num_operations,
+    const BenchmarkParams& params, LockedList* free_list) {
+  mt19937_64 rng(thread_id);
+  vector<BufferPool::BufferHandle> buffers;
+  int ops_done = 0;
+  while (ops_done < num_operations) {
+    // Execute a number of the same operation in a row.
+    const int64_t op = uniform_int_distribution<int64_t>(0, 1)(rng);
+    const int num_ops = uniform_int_distribution<int64_t>(1, 10)(rng);
+    for (int i = 0; i < num_ops; ++i) {
+      if (op == ALLOC_OP) {
+        DoAlloc(params, free_list, &buffers);
+      } else {
+        DCHECK_EQ(op, FREE_OP);
+        DoFree(params, free_list, &buffers);
+      }
+    }
+    ops_done += num_ops;
+  }
+
+  for (BufferHandle& buffer : buffers) allocator.Free(move(buffer));
+}
+
+/// Execute the benchmark with the BenchmarkParams passed via 'data'.
+void FreeListBenchmark(int batch_size, void* data) {
+  ObjectPool pool;
+  const BenchmarkParams& params = *static_cast<BenchmarkParams*>(data);
+  thread_group threads;
+  vector<LockedList*> free_lists;
+  if (params.free_list_mode == FreeListMode::SHARED_LIST) {
+    free_lists.push_back(pool.Add(new LockedList));
+  }
+
+  for (int i = 0; i < params.num_threads; ++i) {
+    if (params.free_list_mode == FreeListMode::LIST_PER_THREAD) {
+      free_lists.push_back(pool.Add(new LockedList));
+    }
+    // Divide operations between threads.
+    int ops_per_thread = batch_size * NUM_OPERATIONS_PER_BATCH / 
params.num_threads;
+    auto* free_list = free_lists.empty() ? nullptr : free_lists.back();
+    threads.add_thread(new thread(FreeListBenchmarkThread, i,
+        ops_per_thread, params, free_list));
+  }
+  threads.join_all();
+
+  // Empty out all of the free lists.
+  for (LockedList* free_list : free_lists) {
+    free_list->list.FreeAll(&allocator);
+  }
+}
+
+int main(int argc, char** argv) {
+  CpuInfo::Init();
+  cout << endl << Benchmark::GetMachineInfo() << endl << endl;
+
+  ObjectPool pool;
+
+  for (int work_iterations : {0, 1, 2, 4}) {
+    // Don't test allocations beyond 256KB - by that point we are purely 
memory-bound.
+    for (int allocation_size_kb : {64, 128, 256}) {
+      Benchmark suite(Substitute("FreeLists $0 iters on $1 kb",
+            work_iterations, allocation_size_kb));
+      for (FreeListMode free_list_mode : {FreeListMode::SHARED_LIST,
+               FreeListMode::LIST_PER_THREAD, FreeListMode::NO_LIST}) {
+        // Vary concurrency up to the # of logical cores.
+        for (int num_threads : {1, 2, 4, CpuInfo::num_cores()}) {
+          BenchmarkParams* params = pool.Add(new BenchmarkParams());
+          params->num_threads = num_threads;
+          params->free_list_mode = free_list_mode;
+          params->work_iterations = work_iterations;
+          params->allocation_size = allocation_size_kb * 1024;
+          suite.AddBenchmark(
+              Substitute("$0 threads $1", num_threads, 
ModeString(free_list_mode)),
+              FreeListBenchmark, params);
+        }
+      }
+      // Increase benchmark time to get more accurate result.
+      cout << suite.Measure(100) << endl;
+    }
+  }
+}

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/5ecb9759/be/src/runtime/bufferpool/CMakeLists.txt
----------------------------------------------------------------------
diff --git a/be/src/runtime/bufferpool/CMakeLists.txt 
b/be/src/runtime/bufferpool/CMakeLists.txt
index 9151807..9f98968 100644
--- a/be/src/runtime/bufferpool/CMakeLists.txt
+++ b/be/src/runtime/bufferpool/CMakeLists.txt
@@ -30,5 +30,6 @@ add_library(BufferPool
 add_dependencies(BufferPool thrift-deps)
 
 ADD_BE_TEST(buffer-pool-test)
+ADD_BE_TEST(free-list-test)
 ADD_BE_TEST(reservation-tracker-test)
 ADD_BE_TEST(suballocator-test)

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/5ecb9759/be/src/runtime/bufferpool/buffer-allocator.cc
----------------------------------------------------------------------
diff --git a/be/src/runtime/bufferpool/buffer-allocator.cc 
b/be/src/runtime/bufferpool/buffer-allocator.cc
index 3befe76..7b7d216 100644
--- a/be/src/runtime/bufferpool/buffer-allocator.cc
+++ b/be/src/runtime/bufferpool/buffer-allocator.cc
@@ -24,16 +24,18 @@ namespace impala {
 BufferAllocator::BufferAllocator(int64_t min_buffer_len)
   : min_buffer_len_(min_buffer_len) {}
 
-Status BufferAllocator::Allocate(int64_t len, uint8_t** buffer) {
+Status BufferAllocator::Allocate(int64_t len, BufferPool::BufferHandle* 
buffer) {
   DCHECK_GE(len, min_buffer_len_);
   DCHECK_EQ(len, BitUtil::RoundUpToPowerOfTwo(len));
 
-  *buffer = reinterpret_cast<uint8_t*>(malloc(len));
-  if (*buffer == NULL) return Status(TErrorCode::BUFFER_ALLOCATION_FAILED, 
len);
+  uint8_t* alloc = reinterpret_cast<uint8_t*>(malloc(len));
+  if (alloc == NULL) return Status(TErrorCode::BUFFER_ALLOCATION_FAILED, len);
+  buffer->Open(alloc, len);
   return Status::OK();
 }
 
-void BufferAllocator::Free(uint8_t* buffer, int64_t len) {
-  free(buffer);
+void BufferAllocator::Free(BufferPool::BufferHandle&& buffer) {
+  free(buffer.data());
+  buffer.Reset(); // Avoid DCHECK in ~BufferHandle().
 }
 }

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/5ecb9759/be/src/runtime/bufferpool/buffer-allocator.h
----------------------------------------------------------------------
diff --git a/be/src/runtime/bufferpool/buffer-allocator.h 
b/be/src/runtime/bufferpool/buffer-allocator.h
index 0c6e36e..4358f75 100644
--- a/be/src/runtime/bufferpool/buffer-allocator.h
+++ b/be/src/runtime/bufferpool/buffer-allocator.h
@@ -20,6 +20,8 @@
 
 #include "common/status.h"
 
+#include "runtime/bufferpool/buffer-pool.h"
+
 namespace impala {
 
 /// The underlying memory allocator for the buffer pool. All buffers are 
allocated through
@@ -35,10 +37,10 @@ class BufferAllocator {
 
   /// Allocate memory for a buffer of 'len' bytes. 'len' must be a 
power-of-two multiple
   /// of the minimum buffer length.
-  Status Allocate(int64_t len, uint8_t** buffer) WARN_UNUSED_RESULT;
+  Status Allocate(int64_t len, BufferPool::BufferHandle* buffer) 
WARN_UNUSED_RESULT;
 
   /// Free the memory for a previously-allocated buffer.
-  void Free(uint8_t* buffer, int64_t len);
+  void Free(BufferPool::BufferHandle&& buffer);
 
  private:
   const int64_t min_buffer_len_;

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/5ecb9759/be/src/runtime/bufferpool/buffer-pool.cc
----------------------------------------------------------------------
diff --git a/be/src/runtime/bufferpool/buffer-pool.cc 
b/be/src/runtime/bufferpool/buffer-pool.cc
index 842501f..2503beb 100644
--- a/be/src/runtime/bufferpool/buffer-pool.cc
+++ b/be/src/runtime/bufferpool/buffer-pool.cc
@@ -35,38 +35,12 @@ DEFINE_int32(concurrent_scratch_ios_per_device, 2,
 
 namespace impala {
 
-BufferPool::BufferHandle::BufferHandle() {
-  Reset();
-}
-
-BufferPool::BufferHandle::BufferHandle(BufferHandle&& src) {
-  Reset();
-  *this = std::move(src);
-}
-
-BufferPool::BufferHandle& BufferPool::BufferHandle::operator=(BufferHandle&& 
src) {
-  DCHECK(!is_open());
-  // Copy over all members then close src.
-  client_ = src.client_;
-  data_ = src.data_;
-  len_ = src.len_;
-  src.Reset();
-  return *this;
-}
-
-void BufferPool::BufferHandle::Open(
-    const ClientHandle* client, uint8_t* data, int64_t len) {
-  client_ = client;
+void BufferPool::BufferHandle::Open(uint8_t* data, int64_t len) {
+  client_ = nullptr;
   data_ = data;
   len_ = len;
 }
 
-void BufferPool::BufferHandle::Reset() {
-  client_ = NULL;
-  data_ = NULL;
-  len_ = -1;
-}
-
 BufferPool::PageHandle::PageHandle() {
   Reset();
 }
@@ -245,14 +219,13 @@ Status BufferPool::AllocateBufferInternal(
     int64_t to_evict = len - delta;
     RETURN_IF_ERROR(EvictCleanPages(to_evict));
   }
-  uint8_t* data;
-  Status status = allocator_->Allocate(len, &data);
+  Status status = allocator_->Allocate(len, buffer);
   if (!status.ok()) {
     buffer_bytes_remaining_.Add(len);
     return status;
   }
-  DCHECK(data != NULL);
-  buffer->Open(client, data, len);
+  DCHECK(buffer->is_open());
+  buffer->client_ = client;
   return Status::OK();
 }
 
@@ -265,8 +238,9 @@ void BufferPool::FreeBuffer(ClientHandle* client, 
BufferHandle* handle) {
 
 void BufferPool::FreeBufferInternal(BufferHandle* handle) {
   DCHECK(handle->is_open());
-  allocator_->Free(handle->data(), handle->len());
-  buffer_bytes_remaining_.Add(handle->len());
+  int64_t buffer_len = handle->len();
+  allocator_->Free(move(*handle));
+  buffer_bytes_remaining_.Add(buffer_len);
   handle->Reset();
 }
 
@@ -329,10 +303,7 @@ Status BufferPool::EvictCleanPages(int64_t bytes_to_evict) 
{
 
   // Free buffers after releasing all the locks. Do this regardless of success 
to avoid
   // leaking buffers.
-  for (BufferHandle& buffer : buffers) {
-    allocator_->Free(buffer.data(), buffer.len());
-    buffer.Reset();
-  }
+  for (BufferHandle& buffer : buffers) allocator_->Free(move(buffer));
   if (bytes_found < bytes_to_evict) {
     // The buffer pool should not be overcommitted so this should only happen 
if there
     // is an accounting error. Add any freed buffers back to 
'buffer_bytes_remaining_'

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/5ecb9759/be/src/runtime/bufferpool/buffer-pool.h
----------------------------------------------------------------------
diff --git a/be/src/runtime/bufferpool/buffer-pool.h 
b/be/src/runtime/bufferpool/buffer-pool.h
index e27e645..b199637 100644
--- a/be/src/runtime/bufferpool/buffer-pool.h
+++ b/be/src/runtime/bufferpool/buffer-pool.h
@@ -360,15 +360,16 @@ class BufferPool::ClientHandle {
 /// BufferPool methods with the BufferHandle as an argument is not supported.
 class BufferPool::BufferHandle {
  public:
-  BufferHandle();
+  BufferHandle() { Reset(); }
   ~BufferHandle() { DCHECK(!is_open()); }
 
-  /// Allow move construction of handles, to support std::move().
-  BufferHandle(BufferHandle&& src);
+  /// Allow move construction of handles, to support std::move(). Inline to 
make moving
+  /// efficient.
+  inline BufferHandle(BufferHandle&& src);
 
   /// Allow move assignment of handles, to support STL classes like 
std::vector.
-  /// Destination must be uninitialized.
-  BufferHandle& operator=(BufferHandle&& src);
+  /// Destination must be uninitialized. Inline to make moving efficient.
+  inline BufferHandle& operator=(BufferHandle&& src);
 
   bool is_open() const { return data_ != NULL; }
   int64_t len() const {
@@ -388,12 +389,14 @@ class BufferPool::BufferHandle {
  private:
   DISALLOW_COPY_AND_ASSIGN(BufferHandle);
   friend class BufferPool;
+  friend class BufferAllocator;
 
   /// Internal helper to set the handle to an opened state.
-  void Open(const ClientHandle* client, uint8_t* data, int64_t len);
+  void Open(uint8_t* data, int64_t len);
 
-  /// Internal helper to reset the handle to an unopened state.
-  void Reset();
+  /// Internal helper to reset the handle to an unopened state. Inlined to 
make moving
+  /// efficient.
+  inline void Reset();
 
   /// The client the buffer handle belongs to, used to validate that the 
correct client
   /// is provided in BufferPool method calls.
@@ -460,6 +463,28 @@ class BufferPool::PageHandle {
   /// is being used.
   const ClientHandle* client_;
 };
+
+inline BufferPool::BufferHandle::BufferHandle(BufferHandle&& src) {
+  Reset();
+  *this = std::move(src);
+}
+
+inline BufferPool::BufferHandle& BufferPool::BufferHandle::operator=(
+    BufferHandle&& src) {
+  DCHECK(!is_open());
+  // Copy over all members then close src.
+  client_ = src.client_;
+  data_ = src.data_;
+  len_ = src.len_;
+  src.Reset();
+  return *this;
+}
+
+inline void BufferPool::BufferHandle::Reset() {
+  client_ = NULL;
+  data_ = NULL;
+  len_ = -1;
+}
 }
 
 #endif

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/5ecb9759/be/src/runtime/bufferpool/free-list-test.cc
----------------------------------------------------------------------
diff --git a/be/src/runtime/bufferpool/free-list-test.cc 
b/be/src/runtime/bufferpool/free-list-test.cc
new file mode 100644
index 0000000..190faa9
--- /dev/null
+++ b/be/src/runtime/bufferpool/free-list-test.cc
@@ -0,0 +1,166 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include <cstdlib>
+#include <random>
+
+#include "common/object-pool.h"
+#include "runtime/bufferpool/free-list.h"
+#include "testutil/gtest-util.h"
+
+#include "common/names.h"
+
+namespace impala {
+
+class FreeListTest : public ::testing::Test {
+ protected:
+  virtual void SetUp() override {
+    allocator_ = obj_pool_.Add(new BufferAllocator(MIN_BUFFER_LEN));
+    SeedRng();
+  }
+
+  virtual void TearDown() override {
+    allocator_ = nullptr;
+    obj_pool_.Clear();
+  }
+
+  /// Seed 'rng_' with a seed either for the environment or based on the 
current time.
+  void SeedRng() {
+    const char* seed_str = getenv("FREE_LIST_TEST_SEED");
+    int64_t seed = seed_str != nullptr ? atoi(seed_str) : time(nullptr);
+    LOG(INFO) << "Random seed: " << seed;
+    rng_.seed(seed);
+  }
+
+  void AllocateBuffers(int num_buffers, int64_t buffer_len,
+      vector<BufferHandle>* buffers) {
+    for (int i = 0; i < num_buffers; ++i) {
+      BufferHandle buffer;
+      ASSERT_OK(allocator_->Allocate(buffer_len, &buffer));
+      buffers->push_back(move(buffer));
+    }
+  }
+
+  vector<const void*> GetSortedAddrs(const vector<BufferHandle>& buffers) {
+    vector<const void*> addrs;
+    for (const BufferHandle& buffer : buffers) addrs.push_back(buffer.data());
+    std::sort(addrs.begin(), addrs.end());
+    return addrs;
+  }
+
+  void AddFreeBuffers(FreeList* list, vector<BufferHandle>* buffers) {
+    for (BufferHandle& buffer : *buffers) {
+      list->AddFreeBuffer(move(buffer));
+    }
+    buffers->clear();
+  }
+
+  void FreeBuffers(vector<BufferHandle>* buffers) {
+    for (BufferHandle& buffer : *buffers) {
+      allocator_->Free(move(buffer));
+    }
+    buffers->clear();
+  }
+
+  const static int MIN_BUFFER_LEN = 1024;
+
+  /// Per-test random number generator. Seeded before every test.
+  std::mt19937 rng_;
+
+  /// Pool for objects with per-test lifetime. Cleared after every test.
+  ObjectPool obj_pool_;
+
+  /// The buffer allocator, owned by 'obj_pool_'.
+  BufferAllocator* allocator_;
+};
+
+const int FreeListTest::MIN_BUFFER_LEN;
+
+// Functional test for a small free list.
+TEST_F(FreeListTest, SmallList) {
+  const int LIST_SIZE = 2;
+  FreeList small_list;
+
+  // PopFreeBuffer() on empty list returns false.
+  BufferHandle buffer;
+  ASSERT_FALSE(small_list.PopFreeBuffer(&buffer));
+  ASSERT_FALSE(small_list.PopFreeBuffer(&buffer));
+
+  // Add various numbers of buffers to the free list and check that they're
+  // either freed or returned in the order expected.
+  for (int num_buffers = 0; num_buffers <= LIST_SIZE + 2; ++num_buffers) {
+    for (int attempt = 0; attempt < 10; ++attempt) {
+      LOG(INFO) << "num_buffers " << num_buffers << " attempt " << attempt;
+      vector<BufferHandle> buffers;
+      AllocateBuffers(num_buffers, MIN_BUFFER_LEN, &buffers);
+
+      // Keep track of the addresses so we can validate the buffer order.
+      const vector<const void*> addrs = GetSortedAddrs(buffers);
+
+      // Try shuffling to make sure we don't always just add in ascending 
order.
+      std::shuffle(buffers.begin(), buffers.end(), rng_);
+      AddFreeBuffers(&small_list, &buffers);
+      // Shrink list down to LIST_SIZE.
+      small_list.FreeBuffers(allocator_, max<int64_t>(0, small_list.Size() - 
LIST_SIZE));
+
+      // The LIST_SIZE buffers with the lowest address should be retained, and 
the
+      // remaining buffers should have been freed.
+      for (int i = 0; i < min(num_buffers, LIST_SIZE); ++i) {
+        ASSERT_TRUE(small_list.PopFreeBuffer(&buffer)) << i;
+        ASSERT_EQ(addrs[i], buffer.data()) << i;
+        buffers.push_back(move(buffer));
+      }
+      ASSERT_FALSE(small_list.PopFreeBuffer(&buffer));
+      FreeBuffers(&buffers);
+    }
+  }
+}
+
+// Functional test that makes sure the free lists return buffers in ascending 
order
+TEST_F(FreeListTest, ReturnOrder) {
+  const int LIST_SIZE = 100;
+  FreeList list;
+  for (int num_buffers = 50; num_buffers <= 400; num_buffers *= 2) {
+    for (int attempt = 0; attempt < 5; ++attempt) {
+      LOG(INFO) << "num_buffers " << num_buffers << " attempt " << attempt;
+      vector<BufferHandle> buffers;
+      AllocateBuffers(num_buffers, MIN_BUFFER_LEN, &buffers);
+
+      // Keep track of the addresses so we can validate the buffer order.
+      const vector<const void*> addrs = GetSortedAddrs(buffers);
+
+      // Try shuffling to make sure we don't always just add in ascending 
order.
+      std::shuffle(buffers.begin(), buffers.end(), rng_);
+      AddFreeBuffers(&list, &buffers);
+
+      // Free buffers. Only the buffers with the high addresses should be 
freed.
+      list.FreeBuffers(allocator_, max<int64_t>(0, list.Size() - LIST_SIZE));
+
+      // Validate that the buffers with lowest addresses are returned in 
ascending order.
+      BufferHandle buffer;
+      for (int i = 0; i < min(LIST_SIZE, num_buffers); ++i) {
+        ASSERT_TRUE(list.PopFreeBuffer(&buffer));
+        ASSERT_EQ(addrs[i], buffer.data()) << i;
+        allocator_->Free(move(buffer));
+      }
+      ASSERT_FALSE(list.PopFreeBuffer(&buffer));
+    }
+  }
+}
+}
+
+IMPALA_TEST_MAIN();

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/5ecb9759/be/src/runtime/bufferpool/free-list.h
----------------------------------------------------------------------
diff --git a/be/src/runtime/bufferpool/free-list.h 
b/be/src/runtime/bufferpool/free-list.h
new file mode 100644
index 0000000..6cec0e8
--- /dev/null
+++ b/be/src/runtime/bufferpool/free-list.h
@@ -0,0 +1,127 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#ifndef IMPALA_RUNTIME_BUFFERPOOL_FREE_LIST_H
+#define IMPALA_RUNTIME_BUFFERPOOL_FREE_LIST_H
+
+#include <algorithm>
+#include <cstdint>
+#include <vector>
+
+#include <boost/thread/locks.hpp>
+
+#include "common/logging.h"
+#include "gutil/macros.h"
+#include "runtime/bufferpool/buffer-allocator.h"
+#include "runtime/bufferpool/buffer-pool.h"
+
+namespace impala {
+
+using BufferHandle = BufferPool::BufferHandle;
+
+/// A non-threadsafe list of free buffers.
+///
+/// Buffers are allocated by the caller and can be added to the list for later 
retrieval
+/// with AddFreeBuffer(). If the list is non-empty, calling PopFreeBuffer() 
will return
+/// one of the buffers previously added to the list. FreeList is agnostic 
about the size
+/// or other properties of the buffers added to it.
+///
+/// Buffers in the list can be freed at any point, e.g. if the list is storing 
too many
+/// free buffers (according to some policy). The caller is responsible for 
implementing
+/// the policy and calling FreeBuffers() or FreeAll() at the appropriate times.
+///
+/// Address space fragmentation
+/// ---------------------------
+/// To reduce memory fragmentation, the free list hands out buffers with lower 
memory
+/// addresses first and frees buffers with higher memory address first. If 
buffers were
+/// handed out by a policy that didn't take memory address into account, over 
time the
+/// distribution of free buffers within the address space would become 
essentially
+/// random. If free buffers were then unmapped, there would be many holes in 
the virtual
+/// memory map, which can cause difficulties for the OS in some cases, e.g. 
exceeding the
+/// maximum number of mmapped() regions (vm.max_map_count) in Linux. Using 
this approach
+/// will tend to consolidate free buffers in higher parts of the address 
space, allowing
+/// coalescing of the holes in most cases.
+class FreeList {
+ public:
+  FreeList() {}
+
+  /// Gets a free buffer. If the list is non-empty, returns true and sets 
'buffer' to
+  /// one of the buffers previously added with AddFreeBuffer(). Otherwise 
returns false.
+  bool PopFreeBuffer(BufferHandle* buffer) {
+    if (free_list_.empty()) return false;
+    std::pop_heap(free_list_.begin(), free_list_.end(), HeapCompare);
+    *buffer = std::move(free_list_.back());
+    free_list_.pop_back();
+    return true;
+  }
+
+  /// Adds a free buffer to the list.
+  void AddFreeBuffer(BufferHandle&& buffer) {
+    free_list_.emplace_back(std::move(buffer));
+    std::push_heap(free_list_.begin(), free_list_.end(), HeapCompare);
+  }
+
+  /// Frees all the buffers in the list with 'allocator'. Returns the number 
of bytes
+  /// freed.
+  int64_t FreeAll(BufferAllocator* allocator) {
+    return FreeBuffers(allocator, free_list_.size());
+  }
+
+  /// Free 'num_buffers' buffers from the list with 'allocator'. Returns the 
number of
+  /// bytes freed. The average time complexity is n log n, where n is the 
current size of
+  /// the list.
+  int64_t FreeBuffers(BufferAllocator* allocator, int64_t num_buffers) {
+    DCHECK_LE(num_buffers, free_list_.size());
+    // Sort the list so we can free the buffers with higher memory addresses.
+    // Note that the sorted list is still a valid min-heap.
+    std::sort(free_list_.begin(), free_list_.end(), SortCompare);
+
+    int64_t bytes_freed = 0;
+    for (int64_t i = 0; i < num_buffers; ++i) {
+      bytes_freed += free_list_.back().len();
+      allocator->Free(std::move(free_list_.back()));
+      free_list_.pop_back();
+    }
+    return bytes_freed;
+  }
+
+  /// Returns the number of buffers currently in the list.
+  int64_t Size() const { return free_list_.size(); }
+
+ private:
+  friend class FreeListTest;
+
+  DISALLOW_COPY_AND_ASSIGN(FreeList);
+
+  /// Compare function that orders by memory address.
+  inline static bool SortCompare(const BufferHandle& b1, const BufferHandle& 
b2) {
+    return b1.data() < b2.data();
+  }
+
+  /// Compare function that orders by memory address. Needs to be inverse of 
SortCompare()
+  /// because C++ provides a max-heap.
+  inline static bool HeapCompare(const BufferHandle& b1, const BufferHandle& 
b2) {
+    return SortCompare(b2, b1);
+  }
+
+  /// List of free memory buffers. Maintained as a min-heap ordered by the 
memory address
+  /// of the buffer.
+  std::vector<BufferHandle> free_list_;
+};
+}
+
+#endif

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/5ecb9759/be/src/util/benchmark-test.cc
----------------------------------------------------------------------
diff --git a/be/src/util/benchmark-test.cc b/be/src/util/benchmark-test.cc
index fd1ed81..6597387 100644
--- a/be/src/util/benchmark-test.cc
+++ b/be/src/util/benchmark-test.cc
@@ -39,7 +39,7 @@ struct MemcpyData {
 class BenchmarkTest {
  public:
   static double Measure(Benchmark::BenchmarkFunction fn, void* data) {
-    return Benchmark::Measure(fn, data);
+    return Benchmark::Measure(fn, data, 50, 10);
   }
 };
 

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/5ecb9759/be/src/util/benchmark.cc
----------------------------------------------------------------------
diff --git a/be/src/util/benchmark.cc b/be/src/util/benchmark.cc
index 0001398..77b046e 100644
--- a/be/src/util/benchmark.cc
+++ b/be/src/util/benchmark.cc
@@ -50,7 +50,7 @@ double Benchmark::Measure(BenchmarkFunction function, void* 
args,
     // in 20% increments.
     // TODO: we can make this more sophisticated if need to be dynamically 
ramp up and
     // ramp down the sizes.
-    batch_size = (iters_guess - iters) / 5;
+    batch_size = max<int>(1, (iters_guess - iters) / 5);
   }
 
   while (sw.ElapsedTime() < target_cycles) {
@@ -85,7 +85,7 @@ int Benchmark::AddBenchmark(const string& name, 
BenchmarkFunction fn, void* args
   return benchmarks_.size() - 1;
 }
 
-string Benchmark::Measure() {
+string Benchmark::Measure(int max_time, int initial_batch_size) {
   if (benchmarks_.empty()) return "";
 
   // Run a warmup to iterate through the data
@@ -116,7 +116,8 @@ string Benchmark::Measure() {
   stringstream ss;
   for (int j = 0; j < NUM_REPS; ++j) {
     for (int i = 0; i < benchmarks_.size(); ++i) {
-      benchmarks_[i].rates.push_back(Measure(benchmarks_[i].fn, 
benchmarks_[i].args));
+      benchmarks_[i].rates.push_back(
+          Measure(benchmarks_[i].fn, benchmarks_[i].args, max_time, 
initial_batch_size));
     }
   }
 

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/5ecb9759/be/src/util/benchmark.h
----------------------------------------------------------------------
diff --git a/be/src/util/benchmark.h b/be/src/util/benchmark.h
index 1bb8f3e..e7cb53c 100644
--- a/be/src/util/benchmark.h
+++ b/be/src/util/benchmark.h
@@ -50,7 +50,11 @@ class Benchmark {
       int baseline_idx = 0);
 
   /// Runs all the benchmarks and returns the result in a formatted string.
-  std::string Measure();
+  /// max_time is the total time to benchmark the function, in ms.
+  /// initial_batch_size is the initial batch size to the run the function.  
The
+  /// harness function will automatically ramp up the batch_size.  The 
benchmark
+  /// will take *at least* initial_batch_size * function invocation time.
+  std::string Measure(int max_time = 50, int initial_batch_size = 10);
 
   /// Output machine/build configuration as a string
   static std::string GetMachineInfo();
@@ -60,12 +64,8 @@ class Benchmark {
 
   /// Benchmarks the 'function' returning the result as invocations per ms.
   /// args is an opaque argument passed as the second argument to the function.
-  /// max_time is the total time to benchmark the function, in ms.
-  /// initial_batch_size is the initial batch size to the run the function.  
The
-  /// harness function will automatically ramp up the batch_size.  The 
benchmark
-  /// will take *at least* initial_batch_size * function invocation time.
-  static double Measure(BenchmarkFunction function, void* args,
-      int max_time = 50, int initial_batch_size = 10);
+  static double Measure(BenchmarkFunction function, void* args, int max_time,
+      int initial_batch_size);
 
   struct BenchmarkResult {
     std::string name;

Reply via email to