Tim Armstrong has posted comments on this change. Change subject: IMPALA-3344: Simplify sorter and document/enforce invariants. ......................................................................
Patch Set 9: (23 comments) http://gerrit.cloudera.org:8080/#/c/2826/9/be/src/runtime/sorted-run-merger.cc File be/src/runtime/sorted-run-merger.cc: Line 88: Advance() returns with 'eos' set to : /// true. > Init can return eos=true as well Good point. http://gerrit.cloudera.org:8080/#/c/2826/9/be/src/runtime/sorter.cc File be/src/runtime/sorter.cc: Line 56: we > the Sorter Done Line 72: If the run is unsorted, it must be sorted. After that set_sorted() must be called. > If the run was already sorted, it then may be merged if necessary. Added. I'm on the fence about whether this is the right place to describe it, since the Run abstraction doesn't really care about what happens with the sorted output. It seems like useful context so I added it though. Line 73: resource > resources Done Line 129: at most fixed-len > at most one? Done Line 158: update counters > I don't see counters being updated Done Line 161: : /// Check if a run can be extended by allocating additional blocks from the block : /// manager. Always true when building a sorted run in an intermediate merge, because : /// the current block(s) can be unpinned before getting the next free block (so a block : /// is always available) : bool CanExtendRun() const; > Remove - I don't see this being used or even implemented Good point. Line 195: StringValue > StringValues Done Line 221: Sizes of sort tuple and block > Not sure what this means. Isn't this just "Size of the sort tuple."? Yep, stale comment. Fixed. Line 275: 'buffered_batch_' is > Remove; no need to restate the name. Done Line 301: TupleIterator > private? We need to construct TupleIterators by index in SelectPivot(). The other iterators are all derived by incrementing/decrementing begin()/end(). Line 326: > const? Done. This also required fixing const-correctness for TupleRow in the whole expr/ subtree, but I was able to do it with sed and it seems worth fixing. Line 439: int block_size = parent->block_mgr_->max_block_size(); : block_capacity_ = block_size / sort_tuple_size_; : last_tuple_block_offset_ = sort_tuple_size_ * ((block_size / sort_tuple_size_) - 1); > sorry to harp on this, but I don't think it's too bad to put these in the i Done Line 773: if (!HasVarLenBlocks()) DCHECK(!CONVERT_OFFSET_TO_PTR); > conversely, should we check that if there are var len slots, then is_pinned Done Line 776: > nit: extra space here It was deliberate to try and make the nesting more obvious. In the end I fixed this by setting end_of_fixed_len_block_ to true in PrepareRead() for this case and changing it to >= Line 777: fixed_len_blocks_.empty() > This is the case where there was no data in the run to begin with or maybe See prev comment. Line 781: // to the batch on eos. > can we add a DCHECK that if that's the case then it's unpinned? I added some DCHECKS that are a little involved but I think provide good coverage. I don't think empty() works since we set the block ptrs to NULL instead of removing them from the vector. These also caught a bug - we didn't attach the last var-len block to the batch. As a result of fixing this bug, a test that was meant to hit a mem-limit didn't hit it, so I updated that test too. Line 893: last > prev_block to be consistent w/ the mode names, and a bit more clear Done Line 898: last > prev Done Line 986: { > const Done Line 986: TotalBytes() > Right now this assumes we haven't deleted any blocks :) Done Line 1004: return > DCHECK index_ == 0 Done Line 1019: index > We could avoid the need for past_end_bytes if you use index_ instead of ind I made the logic a bit more explicit, at the cost of some duplication. I hope it's clearer. -- To view, visit http://gerrit.cloudera.org:8080/2826 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9c619e81fd1b8ac50e257172c8bce101a112b52a Gerrit-PatchSet: 9 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Matthew Jacobs <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
