Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3344: Simplify sorter and document/enforce invariants.
......................................................................


Patch Set 9:

(23 comments)

http://gerrit.cloudera.org:8080/#/c/2826/9/be/src/runtime/sorted-run-merger.cc
File be/src/runtime/sorted-run-merger.cc:

Line 88:  Advance() returns with 'eos' set to
       :   /// true.
Init can return eos=true as well


http://gerrit.cloudera.org:8080/#/c/2826/9/be/src/runtime/sorter.cc
File be/src/runtime/sorter.cc:

Line 56: we
the Sorter


Line 72: If the run is unsorted, it must be sorted. After that set_sorted() 
must be called.
If the run was already sorted, it then may be merged if necessary.


Line 73: resource
resources


Line 129: at most fixed-len
at most one?


Line 158: update counters
I don't see counters being updated


Line 161: 
        :   /// Check if a run can be extended by allocating additional blocks 
from the block
        :   /// manager. Always true when building a sorted run in an 
intermediate merge, because
        :   /// the current block(s) can be unpinned before getting the next 
free block (so a block
        :   /// is always available)
        :   bool CanExtendRun() const;
Remove - I don't see this being used or even implemented


Line 195: StringValue
StringValues


Line 221: Sizes of sort tuple and block
Not sure what this means. Isn't this just "Size of the sort tuple."?


Line 275: 'buffered_batch_' is 
Remove; no need to restate the name.


Line 301: TupleIterator
private?


Line 326:  
const?


Line 439:   int block_size = parent->block_mgr_->max_block_size();
        :   block_capacity_ = block_size / sort_tuple_size_;
        :   last_tuple_block_offset_ = sort_tuple_size_ * ((block_size / 
sort_tuple_size_) - 1);
sorry to harp on this, but I don't think it's too bad to put these in the 
initializer list (so we can make these const) as long as the block offset uses 
the block_capacity_ (which it can do as long as it comes after):

 
   : sorter_(parent),
     sort_tuple_desc_(sort_tuple_desc),
     sort_tuple_size_(sort_tuple_desc->byte_size()),
+    block_capacity_(parent->block_mgr_->max_block_size() / sort_tuple_size_),
+    last_tuple_block_offset_(sort_tuple_size_ * (block_capacity_ - 1)),


Line 773:  if (!HasVarLenBlocks()) DCHECK(!CONVERT_OFFSET_TO_PTR);
conversely, should we check that if there are var len slots, then is_pinned_ == 
CONVERT_OFFSET_TO_PTR ?


Line 776:   
nit: extra space here


Line 777: fixed_len_blocks_.empty()
This is the case where there was no data in the run to begin with or maybe 
someone called GetNext() again after already getting eos? Maybe a brief comment 
since it gets lost after the larger condition before this.


Line 781:     // to the batch on eos.
can we add a DCHECK that if that's the case then it's unpinned?

e.g. DCHECK(fixed_len_blocks_.empty() || !is_pinned_)


Line 893: last
prev_block to be consistent w/ the mode names, and a bit more clear


Line 898: last
prev


Line 986: TotalBytes()
Right now this assumes we haven't deleted any blocks :)
We should check for null blocks while iterating. Might be worth a comment that 
this decreases as blocks are deleted, i.e. it's the total size of blocks 
_currently_ in the run.


Line 986:  {
const


Line 1004: return
DCHECK index_ == 0


Line 1019: index
We could avoid the need for past_end_bytes if you use index_ instead of index 
(it's just off by 1)  :)
obviously the subtlety of these names feels error prone. Maybe we can find 
another variable name for the locally mutated "index" to distinguish from 
index_.


-- 
To view, visit http://gerrit.cloudera.org:8080/2826
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9c619e81fd1b8ac50e257172c8bce101a112b52a
Gerrit-PatchSet: 9
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Matthew Jacobs <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

Reply via email to