Tim Armstrong has posted comments on this change. Change subject: IMPALA-3611: track unused Disk IO buffer memory ......................................................................
Patch Set 6: (10 comments) http://gerrit.cloudera.org:8080/#/c/3246/6/be/src/runtime/disk-io-mgr-scan-range.cc File be/src/runtime/disk-io-mgr-scan-range.cc: PS6, Line 441: NULL > Is there a reason to not use "reader->mem_tracker_ != NULL ? reader->mem_tr We don't want to track this memory, since it's a cached buffer that's owned by the HDFS client in the JVM. Added a comment. http://gerrit.cloudera.org:8080/#/c/3246/6/be/src/runtime/disk-io-mgr.cc File be/src/runtime/disk-io-mgr.cc: PS6, Line 224: > nit: wrong indentation. Done PS6, Line 368: Other > Untracked / Unowned ? Went with untracked. I thought this might be a little confusing, since they are technically tracked, but seems clearer than 'other'. PS6, Line 702: (1 << idx) > 1LL to be consistent with the 64-bit-ness above. I think in practice, this Done (here and elsewhere in this file) PS6, Line 790: // Note: we can't use TryConsume(), which can indirectly call GcIoBuffers(). > , which can indirectly call GcIoBuffers() and lead to deadlock. Done PS6, Line 857: // TODO: we can do a lot better here. The reader can likely make progress : // with fewer io buffers. > Can you please replace this TODO statement with one which indicates that "t Done. Also added a TODO referencing IMPALA-3209, since it's related to making scanners operate in a memory limit. PS6, Line 859: free_buffer_mem_tracker_->AnyLimitExceeded(); > Regarding your previous comment about not calling GcIoBuffer() here, I am n It's safe to do that here since we're not holding free_buffers_lock_ at this point in the code. The change doesn't make a functional difference, it's just for consistency since we are no longer tracking directly against process_mem_tracker_. The pattern generally in the codebase is to interact only with your local MemTracker. http://gerrit.cloudera.org:8080/#/c/3246/6/be/src/runtime/disk-io-mgr.h File be/src/runtime/disk-io-mgr.h: PS6, Line 232: bool is_cached() { return scan_range_->cached_buffer_ != NULL; } > This can be private. Done PS6, Line 237: MemTracker* mem_tracker() { return mem_tracker_; } > Is this used outside of disk-io-mgr at all ? Not now. Removed it. It was at one point when I tried to move this to RowBatch. PS6, Line 243: 'tracker' > 'dst' ? It may be clearer to say transfer from 'mem_tracker_' to 'dst_'. Pl I think this is more related to IMPALA-3209, since this is part of the cycle of memory transfer between the disk io mgr, scanners, and upstream exec nodes. Added a reference to that JIRA. I looked at just checking the memory limit and returning a status, but it looked like exiting early from some of the calling functions could leave things in an inconsistent state. -- To view, visit http://gerrit.cloudera.org:8080/3246 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8777cf76f04d34a46f53d53005412e0f1d63b5b7 Gerrit-PatchSet: 6 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
