Marcel Kornacker has posted comments on this change. Change subject: IMPALA-3201: headers and reservation logic for new buffer pool ......................................................................
Patch Set 17: (37 comments) http://gerrit.cloudera.org:8080/#/c/2569/17/be/src/bufferpool/buffer-allocator.h File be/src/bufferpool/buffer-allocator.h: Line 22: /// Memory allocator for buffers that are multiples of the minimum buffer length. unclear what this is: who uses it, what is the minimum buffer length? http://gerrit.cloudera.org:8080/#/c/2569/15/be/src/bufferpool/buffer-pool.h File be/src/bufferpool/buffer-pool.h: Line 34: /// Centralizing buffer management in a single object allows enforcement of buffer this doesn't need to read like a design doc (ie, you don't have to present the pros and cons) Line 37: /// buffers across queries, rather than reallocating buffers for every query. rather than saying 'it allows us to...' you should simply describe what it does, possibly in bullet points Line 40: /// of at least the minimum buffer length that won't be spilled can also be served by "can also be" is vague; also, i didn't see that functionality here PS15, Line 41: s are neede what this is communicating is basically that there are minimum allocation sizes. rather than being vague and talking about clients, it's more helpful to describe what the audience of this interface is (exec nodes? other allocators? ...). PS15, Line 44: buffer since the client is an important part of the interface, i feel there's information missing. who needs to allocate this client (ie, per exec node, per fragment, per query...)? is it thread-safe? are there other guarantees for the client? PS15, Line 45: reservations differentiate client against other classes (ReservationTracker) Line 55: /// * A page is a logical block of memory that can reside in memory or on disk. there is a need for an additional concept: buffers for mempools will never be spilled. Line 59: /// * A page is pinned if it is pinned by least one buffer pool client. A pinned page appears to assume sharing Line 61: /// * An unpinned page can be written out to disk by the buffer pool so that the buffer it's preferable to defer talking about implementation; here you should focus on user-visible semantics. feel free to have an additional section below to talk about implementation highlights Line 68: /// The buffer pool is of limited size and clients of the buffer pool often need here also: don't talk about what clients "often need", but what this class provides Line 75: /// If a client pins the same page multiple times, additional reservations are consumed. i don't think this makes sense. it sounds like we should prevent pinning of already pinned pages, otherwise we're wasting memory (the reservation prevents other queries from claiming that memory) Line 81: /// reserved buffers in any way, e.g. for keeping unpinned pages in memory, so long as i don't think buffers should be "reserved" (which sounds like "set aside"). a reservation guarantees that you can later get a buffer, but buffers should only be either free or backing a pinned page (or otherwise in use; see mempool comment). Line 88: /// in order to avoid memory lifetime issues such as dangling or invalid pointers to very general comment/argument that doesn't need to be made here PS15, Line 99: a cli > Done why page? from my understanding, pinning only applies to page, not buffers (buffers either hold data or they don't). PS15, Line 101: page c > Done i'd make the same argument here: transfer of resources involves resources. buffers are resources, pages are logical constructs. transfer of ownership also needs to work out of mempools (which don't have pages). Line 102: /// the same handle. sound expensive. what's the rationale against handing off the handle? Line 112: /// implementation detail. worth pointing out in an 'implementation details' section PS15, Line 147: On su > Done in the interest of modularity, you could also give PageHandle interface functions, such as Pin/Unpin/Close. Line 149: create/close don't sound like duals of each other. create/destroy? get/release? Line 157: /// storage backing the page is freed. Idempotent. what does it mean to call this on a handle that we previously closed? Line 224: }; follow formatting convention Line 253: /// All pages pinned by the client count as consumption against 'reservations_' why? Line 259: /// designed to only be used by a single thread at a time: concurrently calling > Added to the class comment. follow formatting conventions (and elsewhere) Line 265: PageHandle() { Reset(); } might as well call this buffer http://gerrit.cloudera.org:8080/#/c/2569/15/be/src/bufferpool/reservation-tracker.h File be/src/bufferpool/reservation-tracker.h: Line 35: /// is entitled to consumed and a 'consumption': the actual amount that is consumed. to consume http://gerrit.cloudera.org:8080/#/c/2569/17/be/src/bufferpool/reservation-tracker.h File be/src/bufferpool/reservation-tracker.h: Line 37: /// nodes. To increase its reservation, a tracker must consume some of its parent's in l35 consumption refers to actual memory usage, not the reservation. now you use the same term 'consumption' to refer to 'subtracting from reservation'. keep the two concepts separated by separate terminology (decrease reservation/consume memory?). Line 49: /// * A tracker's consumption is at least the sum of its childrens' reservations. that doesn't make sense to me if consumption = memory consumption. i would expect that a reservationtracker's reservation = the sum of its childrens' reservations. Line 52: /// Each ReservationTracker can optionally have an associated MemTracker. Any why optionally? if this class tracks memory consumption, i'd expect that to be done via a memtracker. Line 57: /// MemTracker, so reservations count against both the exec node and the query MemTracker. i don't understand that either, this seems to conflate concepts (reservation vs actual consumption). Line 60: /// one level. Otherwise reservations will be double-counted against MemTrackers. should the linking be in the other direction: a memtracker can have an associated reservationtracker. that way, a memtracker for an execnode can check its reservation when increasing mem consumption (but memtrackers below the execnode level don't have associated reservationtrackers). PS17, Line 67: s don't refer to internal state (reservations_) in a function comment. the function comments should describe (all) externally observable behavior. Line 73: /// counted as consumption against 'mem_tracker'. i don't understand why a reservation is reflected somewhere as consumption. Line 99: /// Decrease tracker's reservation by amount. This tracker must have used at least standardize your terminology (here it's 'used', elsewhere 'consumed'). Line 103: /// Consumes the given amount of the reservation. The tracker must have at least 'amount' long line Line 108: void Release(int64_t amount); release doesn't sound like the opposite of consume (in fact, i'd think it means 'close'). Line 111: int64_t Reservation(); Get...() (and the next 2) -- To view, visit http://gerrit.cloudera.org:8080/2569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I35cc89e863efb4cc506657bfdaaaf633a10bbab6 Gerrit-PatchSet: 17 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
