Matthew Jacobs has posted comments on this change. Change subject: IMPALA-3344: Simplify sorter and document/enforce invariants. ......................................................................
Patch Set 7: (18 comments) Not a full review, this time I focused on sorted-run-merger which I kind of neglected in my past review. I'll get back to sorter.cc tomorrow. http://gerrit.cloudera.org:8080/#/c/2826/7/be/src/runtime/sorted-run-merger.cc File be/src/runtime/sorted-run-merger.cc: Line 33: BatchedRowSupplier A few high level things: 1) In terms of the concepts: BatchedRowSupplier isn't a great name in the context of the SortedRunMerger and the RunBatchSupplier(Fn). Given that this is a class encapsulates a single sorted run for use by the merger. The primary point of this is to translate iterating over RowBatches to iterating over rows, so it's more like an iterator wrapper/translater. What about a name like SortedRunWrapper? In general I don't love names with "Wrapper", but in this case I think it helps put it in its place between the top level Merger and the underlying RunBatchSupplierFn. Or other name suggestions? 2) I think the Next() / HasCurrentRow() checking is confusing. Given that Next() doesn't necessarily move next, it'd be nice if it had an interface that looked more like TryAdvanceMinRow (see my comment in the header about AdvanceMinRow), so maybe something like TryAdvance() which ideally could return a bool* advanced like TryAdvanceMinRow(). Though that wouldn't get rid of the need for HasCurrentRow() because Sorter::GetNext() may need to advance the row if it wasn't done before. I think this would be easier if we always just did the advancing before adding the current row to the output batch instead of after. Then I think Sorter::GetNext() only has 1 call to AdvanceMinRow() instead of two, and it can rely on the *advanced out parameter to check instead of HasCurrentRow(). Line 55: exhausted exhausted, Line 57: resouce resources Line 59: Next Per my larger comment at the top, how about: 1) call this TryAdvance() (matching TryAdvanceMinRow(), well, if you take my comment there). 2) add an out param bool* advanced 3) change bool* done to eos to further differentiate w/ advanced. Line 86: there is at least one row remaining in the batch. ...at least one row (including current_row()) remaining in the current batch. Line 87: before 'done' is returned. before Next() returns with 'done' set to true http://gerrit.cloudera.org:8080/#/c/2826/7/be/src/runtime/sorted-run-merger.h File be/src/runtime/sorted-run-merger.h: Line 46: must I don't think 'must' makes sense Line 48: RunBatchSupplier Can we rename this to RunBatchSupplierFn? I find it confusing with BatchedRowSupplier. Line 69: If 'deep_copy_input_' is false If true, this should always be able to advance, right? Can you maybe add that first as it's the simple case. Line 78: AdvanceMinRow TryAdvanceMinRow? http://gerrit.cloudera.org:8080/#/c/2826/7/be/src/runtime/sorter.cc File be/src/runtime/sorter.cc: Line 66: Runs are either "initial runs" constructed from the sorter's input or "intermediate : /// runs" constructed by merging already-sorted runs. The input rows are materialized : /// into sort tuples in the initial runs using the expressions in : /// 'sort_tuple_slot_expr_ctxs_'. It feels like this should be woven in with the paragraph above it where we talk about unsorted vs sorted runs (i.e. the initial runs), and then they're merged, i.e. into "intermediate runs". Line 71: /// The expected calling sequence of functions is as follows: thanks! Line 184: set sets Line 184: return Awkward verb conjugation throughout this sentence. return -> returns Line 185: add adds Line 854: BufferedBlockMgr::Block* curr_block = (*blocks)[block_index]; : BufferedBlockMgr::Block* next_block = (*blocks)[block_index + 1]; Thanks, I think this makes more sense now. Line 974: right right? Line 993: TupleIterator what do you think about having a begin() and end() static constructor? (It wouldn't need to take the index, which is nice.) -- To view, visit http://gerrit.cloudera.org:8080/2826 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9c619e81fd1b8ac50e257172c8bce101a112b52a Gerrit-PatchSet: 7 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Matthew Jacobs <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
