Tim Armstrong has posted comments on this change.

Change subject: IMPALA-1346/1590: fix sorter buffer mgmt when spilling
......................................................................


Patch Set 5:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/2908/5//COMMIT_MSG
Commit Message:

Line 7: IMPALA-1346/1590: fix sorter buffer mgmt when spilling
> IMPALA-2344 too?
Done


http://gerrit.cloudera.org:8080/#/c/2908/5/be/src/runtime/sorter.cc
File be/src/runtime/sorter.cc:

Line 1447
> why does this go away?
I had trouble reasoning about whether it was valid or not, and CleanupAllRuns() 
handled it ok. I think it is valid after thinking some more, so I added it back 
with some explanation.


Line 1449: void Sorter::Close() {
> maybe we can add a DCHECK that basically verifies that if the sorter comple
It's a little tricky in some cases: in the case of returning rows from a 
spilled run the blocks are retained after the last GetNext() call. So Close() 
is when we expect that memory to be released. 

I think in some cases exec nodes aren't good about closing children when 
they're done with them, so in that case we may leave some unused memory around, 
but the fix there is to Close() children nodes when they're done.

We now have a bit more test coverage of expected memory consumption so I feel 
better about this.


PS5, Line 1481: blocks_per_run
> how about renaming this pinned_blocks_per_run?
Done


PS5, Line 1484: blocks
> one or two blocks ..
Done


Line 1496:     if (sorted_runs_.size() > 2) {
I realised that the number here was too conservative. Added a comment to 
explain.


PS5, Line 1497: and may
> in case we ..
Done. That's clearer.


Line 1519:     // Should have errored out before this point if we had only two 
runs.
> not sure what this comment is saying. is it referring to this dcheck in som
Yeah, I tried to clarify. The control flow is slightly tricky in the case where 
we have only two runs. In that case, there's no point in doing an intermediate 
merge, because you can do a final merge instead.


Line 1530:   DCHECK_GT(max_num_runs, 1);
> DCHECK_GE(sorted_runs_.size(), 2) or is that not the case? i guess from Mer
Added the DCHECK.

We should never merge a single run. MergeIntermediateRuns() shouldn't be called 
if there's only a single run, then the sorted_runs_.empty() check exits 
MergeIntermediateRuns() if CreateMerger() managed to consume all of 
sorted_runs_ to successfully set up the final merge.


PS5, Line 1547: set up a merge
> have at least two runs ready to merge
Done


Line 1551:       if (merging_runs_.size() >= 2) break;
> okay to ignore but I think it's easier to read as:
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/2908
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Idfe55cc13c7f2b54cba1d05ade44cbcf6bb573c0
Gerrit-PatchSet: 5
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Matthew Jacobs <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

Reply via email to