Matthew Jacobs has posted comments on this change. Change subject: IMPALA-3344: Simplify sorter and document/enforce invariants. ......................................................................
Patch Set 9: (23 comments) http://gerrit.cloudera.org:8080/#/c/2826/9/be/src/runtime/sorted-run-merger.cc File be/src/runtime/sorted-run-merger.cc: Line 88: Advance() returns with 'eos' set to : /// true. Init can return eos=true as well http://gerrit.cloudera.org:8080/#/c/2826/9/be/src/runtime/sorter.cc File be/src/runtime/sorter.cc: Line 56: we the Sorter Line 72: If the run is unsorted, it must be sorted. After that set_sorted() must be called. If the run was already sorted, it then may be merged if necessary. Line 73: resource resources Line 129: at most fixed-len at most one? Line 158: update counters I don't see counters being updated Line 161: : /// Check if a run can be extended by allocating additional blocks from the block : /// manager. Always true when building a sorted run in an intermediate merge, because : /// the current block(s) can be unpinned before getting the next free block (so a block : /// is always available) : bool CanExtendRun() const; Remove - I don't see this being used or even implemented Line 195: StringValue StringValues Line 221: Sizes of sort tuple and block Not sure what this means. Isn't this just "Size of the sort tuple."? Line 275: 'buffered_batch_' is Remove; no need to restate the name. Line 301: TupleIterator private? Line 326: const? Line 439: int block_size = parent->block_mgr_->max_block_size(); : block_capacity_ = block_size / sort_tuple_size_; : last_tuple_block_offset_ = sort_tuple_size_ * ((block_size / sort_tuple_size_) - 1); sorry to harp on this, but I don't think it's too bad to put these in the initializer list (so we can make these const) as long as the block offset uses the block_capacity_ (which it can do as long as it comes after): : sorter_(parent), sort_tuple_desc_(sort_tuple_desc), sort_tuple_size_(sort_tuple_desc->byte_size()), + block_capacity_(parent->block_mgr_->max_block_size() / sort_tuple_size_), + last_tuple_block_offset_(sort_tuple_size_ * (block_capacity_ - 1)), Line 773: if (!HasVarLenBlocks()) DCHECK(!CONVERT_OFFSET_TO_PTR); conversely, should we check that if there are var len slots, then is_pinned_ == CONVERT_OFFSET_TO_PTR ? Line 776: nit: extra space here Line 777: fixed_len_blocks_.empty() This is the case where there was no data in the run to begin with or maybe someone called GetNext() again after already getting eos? Maybe a brief comment since it gets lost after the larger condition before this. Line 781: // to the batch on eos. can we add a DCHECK that if that's the case then it's unpinned? e.g. DCHECK(fixed_len_blocks_.empty() || !is_pinned_) Line 893: last prev_block to be consistent w/ the mode names, and a bit more clear Line 898: last prev Line 986: TotalBytes() Right now this assumes we haven't deleted any blocks :) We should check for null blocks while iterating. Might be worth a comment that this decreases as blocks are deleted, i.e. it's the total size of blocks _currently_ in the run. Line 986: { const Line 1004: return DCHECK index_ == 0 Line 1019: index We could avoid the need for past_end_bytes if you use index_ instead of index (it's just off by 1) :) obviously the subtlety of these names feels error prone. Maybe we can find another variable name for the locally mutated "index" to distinguish from index_. -- To view, visit http://gerrit.cloudera.org:8080/2826 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9c619e81fd1b8ac50e257172c8bce101a112b52a Gerrit-PatchSet: 9 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Matthew Jacobs <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
