Tim Armstrong has posted comments on this change. Change subject: IMPALA-3671: Add query option to limit scratch space usage ......................................................................
Patch Set 1: (22 comments) Looking pretty good, I think we can simplify the design a bit and fix a couple of bugs. http://gerrit.cloudera.org:8080/#/c/3938/1/be/src/runtime/buffered-block-mgr.cc File be/src/runtime/buffered-block-mgr.cc: Line 216: disable_spill_(state->query_ctx().disable_spilling || block_write_threshold_ == 0), There's a subtle bug here if you have scratch_limit = 0, because the block mgr will try to flush blocks to disk when it is running low on memory (but can still allocate more). My interpretation is that scratch_limit = 0 should mean disable spilling, which means don't even try to flush. The fix is just to add (|| scratch_limit == 0) here. It would be good to add a test to buffered-block-mgr-test similar to the one in https://gerrit.cloudera.org/#/c/1722/ just to verify the fix. Line 245: block_mgr->reset( Don't need newline after reset( Line 813: Status BufferedBlockMgr::AllocateScratchSpace(int64_t block_size, I feel like it would be nice to eventually put some of this retry logic in FileGroup, but that seems like a larger refactoring and not important for the current change. Line 827: return status; Add a brief comment. Something like: // We cannot allocate from any files if we're at the scratch limit. http://gerrit.cloudera.org:8080/#/c/3938/1/be/src/runtime/buffered-block-mgr.h File be/src/runtime/buffered-block-mgr.h: PS1, Line 584: tmp_file_grp maybe just tmp_file_group? It will be hard to remember the abbreviation. http://gerrit.cloudera.org:8080/#/c/3938/1/be/src/runtime/tmp-file-mgr.cc File be/src/runtime/tmp-file-mgr.cc: Line 224: && fileGroup_->current_write_size_ + write_size && should go on previous line, and I think this condition fits on one line. Line 226: return Status(TErrorCode::SCRATCH_LIMIT_EXCEEDED); Let's include the limit value in the error message, will make it easier for users to understand what happened. PS1, Line 271: AddFile AddNewFile()? Otherwise it sounds like the call is providing a file to add. Line 287: file.Remove(); I think we need to call delete on these files: in the tmp-file-mgr.h it says the caller of GetFile() should free the file. You could also make this a vector of unique_ptr so that it's automatically freed when you call clear(). PS1, Line 292: Size Maybe NumFiles()? Size is a little ambiguous: could mean the total bytes. Line 296: int64_t TmpFileMgr::FileGroup::SetSpaceAllocationLimit(int64_t max_size){ missing space before { http://gerrit.cloudera.org:8080/#/c/3938/1/be/src/runtime/tmp-file-mgr.h File be/src/runtime/tmp-file-mgr.h: Line 88: /// The FileGroup this belongs to. Can be null if it does not belong to any group. I feel like we should make file_group_ a required field to simplify things, require it as an argument to CreateFile(), then update the tests in tmp-file-mgr-test to use a trivial FileGroup with no limit. I think it will be an easy change if you create a FileGroup with no limit in SetUp() and pass it to CreateFile(). That way tmp-file-mgr-test is testing something closer to what is used in BufferedBlockMgr and we cut down the code slightly. PS1, Line 89: fileGroup_ file_group_ (not camel case). Line 110: /// Represents a group of files. The total allocated bytes of the group can be The wrapping of some of these comments is a little weird - we normally wrap closer to 90 chars. Line 124: File* GetFileAt(int index); I think these methods need one-line comments. PS1, Line 126: Remove Remove() Line 129: int64_t SetSpaceAllocationLimit(int64_t max_size); If we're not using the return value anywhere, I'd say just make this void. It's easy enough to add it back if we actually need it. PS1, Line 131: bool why bool? Line 134: re,move extra space. PS1, Line 141: current_write_size_ I think the comment would actually be a better variable name: current_space_allocated_; Line 144: int64_t max_write_size_; max_space_allocated_? http://gerrit.cloudera.org:8080/#/c/3938/1/common/thrift/ImpalaService.thrift File common/thrift/ImpalaService.thrift: Line 240: // otherwise specified either as: More concise: Otherwise specified in the same was as MEM_LIMIT. -- To view, visit http://gerrit.cloudera.org:8080/3938 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie3e9039ad1c0df978ab71b956af33fd997714e73 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet Vig <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
