Matthew Jacobs has posted comments on this change. Change subject: IMPALA-3344: Simplify sorter and document/enforce invariants. ......................................................................
Patch Set 5: (42 comments) ~phew~ finally got through a first pass http://gerrit.cloudera.org:8080/#/c/2826/5/be/src/runtime/sorter.cc File be/src/runtime/sorter.cc: Line 92: Finalize How about FinalizeInput to make it clear it happens after consuming input but before sorting/merging? Before I read through all the implementations, I was confused about Finalize() vs PrepareRead(). It would be helpful to have a brief comment in the header walking through the expected calling pattern of these functions. Line 116: . , thus I think joining these sentences will help make it clear this next sentence refers to the case where the run was unpinned Line 116: 1 (2) 1 or 2 (if there are var-len slots) Line 116: Atmost At most Line 116: (2) ? Line 118: In either case, all rows in output_batch will have their fixed and var-len data from : /// the same block. This may be a bit misleading, should this be from the same 1 or 2 block(s), i.e. at most 1 fixed and 1 var-len block? Line 135: // /// Line 143: // /// Line 147: // Finalize the list of blocks: update counters, delete empty final blocks, and /// Line 157: var_slots outdated Line 169: Status Can you comment on when is an error returned, since added indicates whether or not a block was allocated? Line 190: /// data is in the next block. , in which case tuple is unchanged. Line 211: int const Line 214: int same Line 251: /// allocated block. If it was needed, it is deleted in Finalize(). Line 254: so far Does this include the # returned? Line 285: TupleIterator Run should be finalized and unsorted, right? Can you state those conditions and enforce with DCHECKs? I see you already have similar DCHECKs in TupleSorter::Sort(), but I think it's helpful to add them as well for readability & in case anything changes. Line 290: 'run' and 'tuple_size' are passed as arguments to avoid : /// the caller having to have redundant local variables with the same information : /// when using multiple iterators. I don't get what this is referring to Line 305: nit: remove line Line 399: swap tuple nit: swap_tuple Line 610: UnpinAllBlocks If an error is returned, this doesn't return all blocks. I don't know if it's worth addressing that in the code since I guess the error status gets propagated, should fail, and then gets cleaned up, but it would be helpful to indicate that behavior in the header comment. Line 647: if (!added) { : Status status = Status::MemLimitExceeded(); : status.AddDetail(Substitute(MEM_ALLOC_FAILED_ERROR_MSG, "variable")); : return status; : } TryAddBlock says that calling with UNPIN_PREV should succeed. Is that wrong? Line 739: if it did, we would have to return them to the caller. Is it too tricky to add a DCHECK for this? I took a look and guess there's not a trivial way to do it with the way we handle non-partitioned agg/join mem. Line 779: // GetNext() fills rows More direct: // Fills rows into ... Line 803: ++var_len_blocks_index_; : end_of_var_len_block_ = true; I think it's a little confusing to increment the block_idx and also set a bool indicating the end of the block, because I'd assume end_of_var_len_block_ is referring to being at the end of the block at var_len_blocks_idx_, which is not the case. Would it be possible to increment the idx after PinNextReadBlock()? (That'd mean PinNextReadBlock's logic/math would shift a bit.) Line 838: BufferedBlockMgr::Block* block = (*blocks)[block_index]; : BufferedBlockMgr::Block* prev_block = (*blocks)[block_index - 1]; If you make the change i mentioned on l804, this'd be next_block and the line below would be curr_block (or something like that), but then at least on l849 we'd have next_block->Pin(), which makes sense w/ the way this method is named. Maybe this fn would take curr_block_idx and afterwards the caller increments the X_len_blocks_index_. There are other ways to do this too, my original concern is just that it feels inconsistent right now between the indexing and what we consider to be current/next. Line 859: for(const SlotDescriptor* string_slot: sort_tuple_desc_->string_slots()) { nit missing space after for? Line 945: DCHECK_EQ(block_offset, 0); is this true because this is going to be the first string in the next block? I think that makes sense... Can you add a brief comment? Line 956: May be NULL for : // zero-length strings. Please add a quick comment that this case comes from NULL+0=NULL, since NULL is just a typedef for 0. Also a DCHECK that we don't have NULL+x for any non-zero integer, e.g. DHECK(block_start != NULL || block_offset == 0); Line 976: index_(index), : tuple_(NULL) { nit: 1 line Line 984: to remove Line 985: // block_index_ as if it pointing to the last tuple. Add tuple_size_ bytes to it is Line 987: int past_end_bytes = 0; > This logic for initializing it past the end of the is hairy but it does wor Maybe it's cleaner to have a static constructor that returns a TupleIterator at the end (e.g. end(), and prob a symmetrical begin()). That can call a private constructor that just sets the member vars but at least we don't have to handle this as a special case. I don't think we need any other public constructors, but if we do we can expose it easily. Line 1043: temp_tuple_buffer_ = new uint8_t[tuple_size]; : swap_buffer_ = new uint8_t[tuple_size]; will your later work clean up these untracked allocations? Line 1329: if This num_free_buffers() as well as the available_allocated_buffers() (l1348) seems sketchy. I didn't understand why this would work if other operators could be consuming memory between here and l1348. Also there are warnings in buffered-block-mgr.h:392 indicating potential issues using these accessors. Is this something to clarify/address in your next patch? Line 1387: Last Why start calling this "Last"? At the time this is called it seems like it's more the current run. Line 1389: sorted_data_size_->Add(unsorted_run_->TotalBytes()); can you move this down after the sort happens? Line 1394: RETURN_IF_CANCELLED(state_); maybe move this below this scoped block, and the counter incr form l1389 right above this Line 1403: int max_runs_per_final_merge = : block_mgr_->available_allocated_buffers() / blocks_per_run; If I follow this (and the rest of the fn) correctly, sorting is more broken than I thought ... but I see you changed this in your next review so I'll skip over this for now. Line 1416: sifficient sufficient http://gerrit.cloudera.org:8080/#/c/2826/5/tests/query_test/test_sort.py File tests/query_test/test_sort.py: Line 121: mean means Line 123: query nit: line above this -- To view, visit http://gerrit.cloudera.org:8080/2826 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9c619e81fd1b8ac50e257172c8bce101a112b52a Gerrit-PatchSet: 5 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Matthew Jacobs <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
