Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3344: Simplify sorter and document/enforce invariants.
......................................................................


Patch Set 9:

(23 comments)

http://gerrit.cloudera.org:8080/#/c/2826/9/be/src/runtime/sorted-run-merger.cc
File be/src/runtime/sorted-run-merger.cc:

Line 88:  Advance() returns with 'eos' set to
       :   /// true.
> Init can return eos=true as well
Good point.


http://gerrit.cloudera.org:8080/#/c/2826/9/be/src/runtime/sorter.cc
File be/src/runtime/sorter.cc:

Line 56: we
> the Sorter
Done


Line 72: If the run is unsorted, it must be sorted. After that set_sorted() 
must be called.
> If the run was already sorted, it then may be merged if necessary.
Added. I'm on the fence about whether this is the right place to describe it, 
since the Run abstraction doesn't really care about what happens with the 
sorted output. It seems like useful context so I added it though.


Line 73: resource
> resources
Done


Line 129: at most fixed-len
> at most one?
Done


Line 158: update counters
> I don't see counters being updated
Done


Line 161: 
        :   /// Check if a run can be extended by allocating additional blocks 
from the block
        :   /// manager. Always true when building a sorted run in an 
intermediate merge, because
        :   /// the current block(s) can be unpinned before getting the next 
free block (so a block
        :   /// is always available)
        :   bool CanExtendRun() const;
> Remove - I don't see this being used or even implemented
Good point.


Line 195: StringValue
> StringValues
Done


Line 221: Sizes of sort tuple and block
> Not sure what this means. Isn't this just "Size of the sort tuple."?
Yep, stale comment. Fixed.


Line 275: 'buffered_batch_' is 
> Remove; no need to restate the name.
Done


Line 301: TupleIterator
> private?
We need to construct TupleIterators by index in SelectPivot(). The other 
iterators are all derived by incrementing/decrementing begin()/end().


Line 326:  
> const?
Done. This also required fixing const-correctness for TupleRow in the whole 
expr/ subtree, but I was able to do it with sed and it seems worth fixing.


Line 439:   int block_size = parent->block_mgr_->max_block_size();
        :   block_capacity_ = block_size / sort_tuple_size_;
        :   last_tuple_block_offset_ = sort_tuple_size_ * ((block_size / 
sort_tuple_size_) - 1);
> sorry to harp on this, but I don't think it's too bad to put these in the i
Done


Line 773:  if (!HasVarLenBlocks()) DCHECK(!CONVERT_OFFSET_TO_PTR);
> conversely, should we check that if there are var len slots, then is_pinned
Done


Line 776:   
> nit: extra space here
It was deliberate to try and make the nesting more obvious. In the end I fixed 
this by setting end_of_fixed_len_block_ to true in PrepareRead() for this case 
and changing it to >=


Line 777: fixed_len_blocks_.empty()
> This is the case where there was no data in the run to begin with or maybe 
See prev comment.


Line 781:     // to the batch on eos.
> can we add a DCHECK that if that's the case then it's unpinned?
I added some DCHECKS that are a little involved but I think provide good 
coverage. I don't think empty() works since we set the block ptrs to NULL 
instead of removing them from the vector.

These also caught a bug - we didn't attach the last var-len block to the batch. 
As a result of fixing this bug, a test that was meant to hit a mem-limit didn't 
hit it, so I updated that test too.


Line 893: last
> prev_block to be consistent w/ the mode names, and a bit more clear
Done


Line 898: last
> prev
Done


Line 986:  {
> const
Done


Line 986: TotalBytes()
> Right now this assumes we haven't deleted any blocks :)
Done


Line 1004: return
> DCHECK index_ == 0
Done


Line 1019: index
> We could avoid the need for past_end_bytes if you use index_ instead of ind
I made the logic a bit more explicit, at the cost of some duplication. I hope 
it's clearer.


-- 
To view, visit http://gerrit.cloudera.org:8080/2826
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9c619e81fd1b8ac50e257172c8bce101a112b52a
Gerrit-PatchSet: 9
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Matthew Jacobs <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

Reply via email to