Tim Armstrong has posted comments on this change. Change subject: IMPALA-1346/1590: fix sorter buffer mgmt when spilling ......................................................................
Patch Set 5: (11 comments) http://gerrit.cloudera.org:8080/#/c/2908/5//COMMIT_MSG Commit Message: Line 7: IMPALA-1346/1590: fix sorter buffer mgmt when spilling > IMPALA-2344 too? Done http://gerrit.cloudera.org:8080/#/c/2908/5/be/src/runtime/sorter.cc File be/src/runtime/sorter.cc: Line 1447 > why does this go away? I had trouble reasoning about whether it was valid or not, and CleanupAllRuns() handled it ok. I think it is valid after thinking some more, so I added it back with some explanation. Line 1449: void Sorter::Close() { > maybe we can add a DCHECK that basically verifies that if the sorter comple It's a little tricky in some cases: in the case of returning rows from a spilled run the blocks are retained after the last GetNext() call. So Close() is when we expect that memory to be released. I think in some cases exec nodes aren't good about closing children when they're done with them, so in that case we may leave some unused memory around, but the fix there is to Close() children nodes when they're done. We now have a bit more test coverage of expected memory consumption so I feel better about this. PS5, Line 1481: blocks_per_run > how about renaming this pinned_blocks_per_run? Done PS5, Line 1484: blocks > one or two blocks .. Done Line 1496: if (sorted_runs_.size() > 2) { I realised that the number here was too conservative. Added a comment to explain. PS5, Line 1497: and may > in case we .. Done. That's clearer. Line 1519: // Should have errored out before this point if we had only two runs. > not sure what this comment is saying. is it referring to this dcheck in som Yeah, I tried to clarify. The control flow is slightly tricky in the case where we have only two runs. In that case, there's no point in doing an intermediate merge, because you can do a final merge instead. Line 1530: DCHECK_GT(max_num_runs, 1); > DCHECK_GE(sorted_runs_.size(), 2) or is that not the case? i guess from Mer Added the DCHECK. We should never merge a single run. MergeIntermediateRuns() shouldn't be called if there's only a single run, then the sorted_runs_.empty() check exits MergeIntermediateRuns() if CreateMerger() managed to consume all of sorted_runs_ to successfully set up the final merge. PS5, Line 1547: set up a merge > have at least two runs ready to merge Done Line 1551: if (merging_runs_.size() >= 2) break; > okay to ignore but I think it's easier to read as: Done -- To view, visit http://gerrit.cloudera.org:8080/2908 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Idfe55cc13c7f2b54cba1d05ade44cbcf6bb573c0 Gerrit-PatchSet: 5 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Matthew Jacobs <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
