Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3344: Simplify sorter and document/enforce invariants.
......................................................................


Patch Set 7:

(18 comments)

Not a full review, this time I focused on sorted-run-merger which I kind of 
neglected in my past review. I'll get back to sorter.cc tomorrow.

http://gerrit.cloudera.org:8080/#/c/2826/7/be/src/runtime/sorted-run-merger.cc
File be/src/runtime/sorted-run-merger.cc:

Line 33: BatchedRowSupplier
A few high level things:

1) In terms of the concepts:
BatchedRowSupplier isn't a great name in the context of the SortedRunMerger and 
the RunBatchSupplier(Fn).

Given that this is a class encapsulates a single sorted run for use by the 
merger. The primary point of this is to translate iterating over RowBatches to 
iterating over rows, so it's more like an iterator wrapper/translater. What 
about a name like SortedRunWrapper? In general I don't love names with 
"Wrapper", but in this case I think it helps put it in its place between the 
top level Merger and the underlying RunBatchSupplierFn. Or other name 
suggestions?


2) I think the Next() / HasCurrentRow() checking is confusing. Given that 
Next() doesn't necessarily move next, it'd be nice if it had an interface that 
looked more like TryAdvanceMinRow (see my comment in the header about 
AdvanceMinRow), so maybe something like TryAdvance() which ideally could return 
a bool* advanced like TryAdvanceMinRow(). Though that wouldn't get rid of the 
need for HasCurrentRow() because Sorter::GetNext() may need to advance the row 
if it wasn't done before. I think this would be easier if we always just did 
the advancing before adding the current row to the output batch instead of 
after. Then I think Sorter::GetNext() only has 1 call to AdvanceMinRow() 
instead of two, and it can rely on the *advanced out parameter to check instead 
of HasCurrentRow().


Line 55: exhausted
exhausted,


Line 57: resouce
resources


Line 59: Next
Per my larger comment at the top, how about:
1) call this TryAdvance() (matching TryAdvanceMinRow(), well, if you take my 
comment there).
2) add an out param bool* advanced
3) change bool* done to eos to further differentiate w/ advanced.


Line 86: there is at least one row remaining in the batch.
...at least one row (including current_row()) remaining in the current batch.


Line 87: before 'done' is returned.
before Next() returns with 'done' set to true


http://gerrit.cloudera.org:8080/#/c/2826/7/be/src/runtime/sorted-run-merger.h
File be/src/runtime/sorted-run-merger.h:

Line 46: must
I don't think 'must' makes sense


Line 48: RunBatchSupplier
Can we rename this to RunBatchSupplierFn? I find it confusing with 
BatchedRowSupplier.


Line 69: If 'deep_copy_input_' is false
If true, this should always be able to advance, right? Can you maybe add that 
first as it's the simple case.


Line 78: AdvanceMinRow
TryAdvanceMinRow?


http://gerrit.cloudera.org:8080/#/c/2826/7/be/src/runtime/sorter.cc
File be/src/runtime/sorter.cc:

Line 66: Runs are either "initial runs" constructed from the sorter's input or 
"intermediate
       : /// runs" constructed by merging already-sorted runs. The input rows 
are materialized
       : /// into sort tuples in the initial runs using the expressions in
       : /// 'sort_tuple_slot_expr_ctxs_'.
It feels like this should be woven in with the paragraph above it where we talk 
about unsorted vs sorted runs (i.e. the initial runs), and then they're merged, 
i.e. into "intermediate runs".


Line 71: /// The expected calling sequence of functions is as follows:
thanks!


Line 184: set
sets


Line 184: return
Awkward verb conjugation throughout this sentence.

return -> returns


Line 185: add
adds


Line 854:   BufferedBlockMgr::Block* curr_block = (*blocks)[block_index];
        :   BufferedBlockMgr::Block* next_block = (*blocks)[block_index + 1];
Thanks, I think this makes more sense now.


Line 974: right
right?


Line 993: TupleIterator
what do you think about having a begin() and end() static constructor? (It 
wouldn't need to take the index, which is nice.)


-- 
To view, visit http://gerrit.cloudera.org:8080/2826
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9c619e81fd1b8ac50e257172c8bce101a112b52a
Gerrit-PatchSet: 7
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Matthew Jacobs <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

Reply via email to