IMPALA-3201: buffer pool header only

Implementation will follow in a subsequent commit.

The BufferPool interface and reservation bookkeeping are documented.

Includes documentation for some guarantees of new reservation mechanism
where reservations are always guaranteed and pages can't be pinned and
buffers can't be allocated without without a reservation. Reservations
are tracked via a hierarchy of ReservationTrackers.

Change-Id: Id771dea2eb4c1aa13c30d59e8b184a7d1bca8d34
Reviewed-on: http://gerrit.cloudera.org:8080/3992
Reviewed-by: Tim Armstrong <[email protected]>
Tested-by: Internal Jenkins


Project: http://git-wip-us.apache.org/repos/asf/incubator-impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-impala/commit/be6a3bc1
Tree: http://git-wip-us.apache.org/repos/asf/incubator-impala/tree/be6a3bc1
Diff: http://git-wip-us.apache.org/repos/asf/incubator-impala/diff/be6a3bc1

Branch: refs/heads/master
Commit: be6a3bc1a0d51163c8fb9a295cdae9288498ac9d
Parents: 5ec76c6
Author: Tim Armstrong <[email protected]>
Authored: Wed Jun 22 18:24:04 2016 -0700
Committer: Internal Jenkins <[email protected]>
Committed: Tue Aug 16 02:04:09 2016 +0000

----------------------------------------------------------------------
 be/src/bufferpool/buffer-pool.h | 318 +++++++++++++++++++++++++++++++++++
 1 file changed, 318 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/be6a3bc1/be/src/bufferpool/buffer-pool.h
----------------------------------------------------------------------
diff --git a/be/src/bufferpool/buffer-pool.h b/be/src/bufferpool/buffer-pool.h
new file mode 100644
index 0000000..183e538
--- /dev/null
+++ b/be/src/bufferpool/buffer-pool.h
@@ -0,0 +1,318 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#ifndef IMPALA_BUFFER_POOL_H
+#define IMPALA_BUFFER_POOL_H
+
+#include <boost/thread/locks.hpp>
+#include <string>
+#include <stdint.h>
+
+#include "bufferpool/buffer-allocator.h"
+#include "common/atomic.h"
+#include "common/status.h"
+#include "gutil/macros.h"
+#include "util/internal-queue.h"
+#include "util/spinlock.h"
+
+namespace impala {
+
+class ReservationTracker;
+
+/// A buffer pool that manages memory buffers for all queries in an Impala 
daemon.
+/// The buffer pool enforces buffer reservations, limits, and implements 
policies
+/// for moving spilled memory from in-memory buffers to disk. It also enables 
reuse of
+/// buffers between queries, to avoid frequent allocations.
+///
+/// The buffer pool can be used for allocating any large buffers (above a 
configurable
+/// minimum length), whether or not the buffers will be spilled. Smaller 
allocations
+/// are not serviced directly by the buffer pool: clients of the buffer pool 
must
+/// subdivide buffers if they wish to use smaller allocations.
+///
+/// All buffer pool operations are in the context of a registered buffer pool 
client.
+/// A buffer pool client should be created for every allocator of buffers at 
the level
+/// of granularity required for reporting and enforcement of reservations, 
e.g. an exec
+/// node. The client tracks buffer reservations via its ReservationTracker and 
also
+/// includes info that is helpful for debugging (e.g. the operator that is 
associated
+/// with the buffer). The client is not threadsafe, i.e. concurrent buffer pool
+/// operations should not be invoked for the same client.
+///
+/// TODO:
+/// * Implement spill-to-disk.
+/// * Decide on, document, and enforce upper limits on page size.
+///
+/// Pages, Buffers and Pinning
+/// ==========================
+/// * A page is a logical block of memory that can reside in memory or on disk.
+/// * A buffer is a physical block of memory that can hold a page in memory.
+/// * A page handle is used by buffer pool clients to identify and access a 
page and
+///   the corresponding buffer. Clients do not interact with pages directly.
+/// * A buffer handle is used by buffer pool clients to identify and access a 
buffer.
+/// * A page is pinned if it has pin count > 0. A pinned page stays mapped to 
the same
+///   buffer.
+/// * An unpinned page can be written out to disk by the buffer pool so that 
the buffer
+///   can be used for another purpose.
+///
+/// Buffer/Page Sizes
+/// =================
+/// The buffer pool has a minimum buffer size, which must be a power-of-two. 
Page and
+/// buffer sizes must be an exact multiple of the minimum buffer size.
+///
+/// Reservations
+/// ============
+/// Before allocating buffers or pinning pages, a client must reserve memory 
through its
+/// ReservationTracker. Reservation of n bytes give a client the right to 
allocate
+/// buffers or pin pages summing up to n bytes. Reservations are both 
necessary and
+/// sufficient for a client to allocate buffers or pin pages: the operations 
succeed
+/// unless a "system error" such as a disk write error is encountered that 
prevents
+/// unpinned pages from being  to disk.
+///
+/// More memory may be reserved than is used, e.g. if a client is not using 
its full
+/// reservation. In such cases, the buffer pool can use the free buffers in 
any way,
+/// e.g. for keeping unpinned pages in memory, so long as it is able to 
fulfill the
+/// reservations when needed, e.g. by flushing unpinned pages to disk.
+///
+/// Page/Buffer Handles
+/// ===================
+/// The buffer pool exposes PageHandles and BufferHandles, which are owned by 
clients of
+/// the buffer pool, and act as a proxy for the internal data structure 
representing the
+/// page or buffer in the buffer pool. Handles are "open" if they are 
associated with a
+/// page or buffer. An open PageHandle is obtained by creating a page. 
PageHandles are
+/// closed by calling BufferPool::DestroyPage(). An open BufferHandle is 
obtained by
+/// allocating a buffer or extracting a BufferHandle from a PageHandle. A 
page's buffer
+/// can also be accessed through the PageHandle. The handle destructors check 
for
+/// resource leaks, e.g. an open handle that would result in a buffer leak.
+///
+/// Pin Counting of Page Handles:
+/// ----------------------------------
+/// Page handles are scoped to a client. The invariants are as follows:
+/// * A page can only be accessed through an open handle.
+/// * A page is destroyed once the handle is destroyed via DestroyPage().
+/// * A page's buffer can only be accessed through a pinned handle.
+/// * Pin() can be called on an open handle, incrementing the handle's pin 
count.
+/// * Unpin() can be called on a pinned handle, but not an unpinned handle.
+/// * Pin() always increases usage of reservations, and Unpin() always 
decreases usage,
+///   i.e. the handle consumes <pin count> * <page size> bytes of reservation.
+///
+/// Example Usage: Buffers
+/// ==================================
+/// The simplest use case is to allocate a memory buffer.
+/// * The new buffer is created with AllocateBuffer().
+/// * The client reads and writes to the buffer as it sees fit.
+/// * If the client is done with the buffer's contents it can call 
FreeBuffer() to
+///   destroy the handle and free the buffer, or use TransferBuffer() to 
transfer
+///   the buffer to a different client.
+///
+/// Example Usage: Spillable Pages
+/// ==============================
+/// * A spilling operator creates a new page with CreatePage().
+/// * The client reads and writes to the page's buffer as it sees fit.
+/// * If the operator encounters memory pressure, it can decrease reservation 
usage by
+///   calling Unpin() on the page. The page may then be written to disk and 
its buffer
+///   repurposed internally by BufferPool.
+/// * Once the operator needs the page's contents again and has sufficient 
unused
+///   reservations, it can call Pin(), which brings the page's contents back 
into memory,
+///   perhaps in a different buffer. Therefore the operator must fix up any 
pointers into
+///   the previous buffer.
+/// * If the operator is done with the page, it can call FreeBuffer() to 
destroy the
+///   handle and release resources, or call ExtractBuffer() to extract the 
buffer.
+///
+/// Synchronization
+/// ===============
+/// The data structures in the buffer pool itself are thread-safe. 
Client-owned data
+/// structures - Client, PageHandle and BufferHandle - are not protected from 
concurrent
+/// access by the buffer pool: clients must ensure that they do not invoke 
concurrent
+/// operations with the same Client, PageHandle or BufferHandle.
+//
+/// +========================+
+/// | IMPLEMENTATION DETAILS |
+/// +========================+
+/// ... TODO ...
+class BufferPool {
+ public:
+  class Client;
+  class BufferHandle;
+  class PageHandle;
+
+  /// Constructs a new buffer pool.
+  /// 'min_buffer_len': the minimum buffer length for the pool. Must be a 
power of two.
+  /// 'buffer_bytes_limit': the maximum physical memory in bytes that can be 
used by the
+  ///     buffer pool. If 'buffer_bytes_limit' is not a multiple of 
'min_buffer_len', the
+  ///     remainder will not be usable.
+  BufferPool(int64_t min_buffer_len, int64_t buffer_bytes_limit);
+  ~BufferPool();
+
+  /// Register a client. Returns an error status and does not register the 
client if the
+  /// arguments are invalid. 'name' is an arbitrary used to identify the 
client in any
+  /// errors messages or logging. 'client' is the client to register. 'client' 
should not
+  /// already be registered.
+  Status RegisterClient(const std::string& name, ReservationTracker* 
reservations,
+      Client* client);
+
+  /// Deregister 'client' if it is registered. Idempotent.
+  void DeregisterClient(Client* client);
+
+  /// Create a new page of 'len' bytes with pin count 1. 'len' must be a page 
length
+  /// supported by BufferPool (see BufferPool class comment). The client must 
have
+  /// sufficient unused reservations to pin the new page (otherwise it will 
DCHECK).
+  /// CreatePage() only fails when a system error prevents the buffer pool 
from fulfilling
+  /// the reservation.
+  /// On success, the handle is mapped to the new page.
+  Status CreatePage(Client* client, int64_t len, PageHandle* handle);
+
+  /// Increment the pin count of 'handle'. After Pin() the underlying page will
+  /// be mapped to a buffer, which will be accessible through 'handle'. Uses
+  /// reservation from 'client'. The caller is responsible for ensuring it has 
enough
+  /// unused reservation before calling Pin() (otherwise it will DCHECK). 
Pin() only
+  /// fails when a system error prevents the buffer pool from fulfilling the 
reservation.
+  /// 'handle' must be open.
+  Status Pin(Client* client, PageHandle* handle);
+
+  /// Decrement the pin count of 'handle'. Decrease client's reservation 
usage. If the
+  /// handle's pin count becomes zero, it is no longer valid for the 
underlying page's
+  /// buffer to be accessed via 'handle'. If the page's total pin count across 
all
+  /// handles that reference it goes to zero, the page's data may be written 
to disk and
+  /// the buffer reclaimed. 'handle' must be open and have a pin count > 0.
+  /// TODO: once we implement spilling, it will be an error to call Unpin() 
with
+  /// spilling disabled. E.g. if Impala is running without scratch (we want to 
be
+  /// able to test Unpin() before we implement actual spilling).
+  void Unpin(Client* client, PageHandle* handle);
+
+  /// Destroy the page referenced by 'handle' (if 'handle' is open). Any 
buffers or disk
+  /// storage backing the page are freed. Idempotent. If the page is pinned, 
the
+  /// reservation usage is decreased accordingly.
+  void DestroyPage(Client* client, PageHandle* handle);
+
+  /// Extracts buffer from a pinned page. After this returns, the page 
referenced by
+  /// 'page_handle' will be destroyed and 'buffer_handle' will reference the 
buffer from
+  /// 'page_handle'. This may decrease reservation usage if the page was 
pinned multiple
+  /// times via 'page_handle'.
+  void ExtractBuffer(PageHandle* page_handle, BufferHandle* buffer_handle);
+
+  /// Allocates a new buffer of 'len' bytes. Uses reservation from 'client'. 
The caller
+  /// is responsible for ensuring it has enough unused reservation before 
calling
+  /// AllocateBuffer() (otherwise it will DCHECK). AllocateBuffer() only fails 
when
+  /// a system error prevents the buffer pool from fulfilling the reservation.
+  Status AllocateBuffer(Client* client, int64_t len, BufferHandle* handle);
+
+  /// If 'handle' is open, close 'handle', free the buffer and and decrease the
+  /// reservation usage from 'client'. Idempotent.
+  void FreeBuffer(Client* client, BufferHandle* handle);
+
+  /// Transfer ownership of buffer from 'src_client' to 'dst_client' and move 
the
+  /// handle from 'src' to 'dst'. Increases reservation usage in 'dst_client' 
and
+  /// decreases reservation usage in 'src_client'. 'src' must be open and 
'dst' must
+  /// be closed
+  /// before calling. After a successful call, 'src' is closed and 'dst' is 
open.
+  Status TransferBuffer(Client* src_client, BufferHandle* src, Client* 
dst_client,
+      BufferHandle* dst);
+
+  /// Print a debug string with the state of the buffer pool.
+  std::string DebugString();
+
+  int64_t min_buffer_len() const;
+  int64_t buffer_bytes_limit() const;
+
+ private:
+  DISALLOW_COPY_AND_ASSIGN(BufferPool);
+};
+
+/// External representation of a client of the BufferPool. Clients are used for
+/// reservation accounting, and will be used in the future for tracking 
per-client
+/// buffer pool counters. This class is the external handle for a client so
+/// each Client instance is owned by the BufferPool's client, rather than the 
BufferPool.
+/// Each Client should only be used by a single thread at a time: concurrently 
calling
+/// Client methods or BufferPool methods with the Client as an argument is not 
supported.
+class BufferPool::Client {
+ public:
+  Client() {}
+  /// Client must be deregistered.
+  ~Client() { DCHECK(!is_registered()); }
+
+  bool is_registered() const;
+  ReservationTracker* reservations();
+
+  std::string DebugString() const;
+
+ private:
+  DISALLOW_COPY_AND_ASSIGN(Client);
+};
+
+
+/// The handle for a page used by clients of the BufferPool. Each PageHandle 
should
+/// only be used by a single thread at a time: concurrently calling PageHandle 
methods
+/// or BufferPool methods with the PageHandle as an argument is not supported.
+class BufferPool::PageHandle {
+ public:
+  PageHandle();
+  ~PageHandle() { DCHECK(!is_open()); }
+
+  // Allow move construction of page handles, to support std::move().
+  PageHandle(PageHandle&& src);
+
+  // Allow move assignment of page handles, to support STL classes like 
std::vector.
+  // Destination must be closed.
+  PageHandle& operator=(PageHandle&& src);
+
+  bool is_open() const;
+  bool is_pinned() const;
+  int64_t len() const;
+  /// Get a pointer to the start of the page's buffer. Only valid to call if 
the page
+  /// is pinned via this handle.
+  uint8_t* data() const;
+
+  /// Return a pointer to the page's buffer handle. Only valid to call if the 
page is
+  /// pinned via this handle. Only const accessors of the returned handle can 
be used:
+  /// it is invalid to call FreeBuffer() or TransferBuffer() on it or to 
otherwise modify
+  /// the handle.
+  const BufferHandle* buffer_handle() const;
+
+  std::string DebugString() const;
+
+ private:
+  DISALLOW_COPY_AND_ASSIGN(PageHandle);
+};
+
+/// A handle to a buffer allocated from the buffer pool. Each BufferHandle 
should only
+/// be used by a single thread at a time: concurrently calling BufferHandle 
methods or
+/// BufferPool methods with the BufferHandle as an argument is not supported.
+class BufferPool::BufferHandle {
+ public:
+  BufferHandle();
+  ~BufferHandle() { DCHECK(!is_open()); }
+
+  /// Allow move construction of handles, to support std::move().
+  BufferHandle(BufferHandle&& src);
+
+  /// Allow move assignment of handles, to support STL classes like 
std::vector.
+  /// Destination must be uninitialized.
+  BufferHandle& operator=(BufferHandle&& src);
+
+  bool is_open() const;
+  int64_t len() const;
+  /// Get a pointer to the start of the buffer.
+  uint8_t* data() const;
+
+  std::string DebugString() const;
+
+ private:
+  DISALLOW_COPY_AND_ASSIGN(BufferHandle);
+};
+
+}
+
+#endif

Reply via email to