Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3671: Add query option to limit scratch space usage
......................................................................


Patch Set 1:

(22 comments)

Looking pretty good, I think we can simplify the design a bit and fix a couple 
of bugs.

http://gerrit.cloudera.org:8080/#/c/3938/1/be/src/runtime/buffered-block-mgr.cc
File be/src/runtime/buffered-block-mgr.cc:

Line 216:     disable_spill_(state->query_ctx().disable_spilling || 
block_write_threshold_ == 0),
There's a subtle bug here if you have scratch_limit = 0, because the block mgr 
will try to flush blocks to disk when it is running low on memory (but can 
still allocate more). 

My interpretation is that scratch_limit = 0 should mean disable spilling, which 
means don't even try to flush.

The fix is just to add (|| scratch_limit == 0) here.

It would be good to add a test to buffered-block-mgr-test similar to the one in 
https://gerrit.cloudera.org/#/c/1722/ just to verify the fix.


Line 245:       block_mgr->reset(
Don't need newline after reset(


Line 813: Status BufferedBlockMgr::AllocateScratchSpace(int64_t block_size,
I feel like it would be nice to eventually put some of this retry logic in 
FileGroup, but that seems like a larger refactoring and not important for the 
current change.


Line 827:       return status;
Add a brief comment. Something like:

// We cannot allocate from any files if we're at the scratch limit.


http://gerrit.cloudera.org:8080/#/c/3938/1/be/src/runtime/buffered-block-mgr.h
File be/src/runtime/buffered-block-mgr.h:

PS1, Line 584: tmp_file_grp
maybe just tmp_file_group? It will be hard to remember the abbreviation.


http://gerrit.cloudera.org:8080/#/c/3938/1/be/src/runtime/tmp-file-mgr.cc
File be/src/runtime/tmp-file-mgr.cc:

Line 224:       && fileGroup_->current_write_size_ + write_size
&& should go on previous line, and I think this condition fits on one line.


Line 226:     return Status(TErrorCode::SCRATCH_LIMIT_EXCEEDED);
Let's include the limit value in the error message, will make it easier for 
users to understand what happened.


PS1, Line 271: AddFile
AddNewFile()?

Otherwise it sounds like the call is providing a file to add.


Line 287:     file.Remove();
I think we need to call delete on these files: in the tmp-file-mgr.h it says 
the caller of GetFile() should free the file.

You could also make this a vector of unique_ptr so that it's automatically 
freed when you call clear().


PS1, Line 292: Size
Maybe NumFiles()? Size is a little ambiguous: could mean the total bytes.


Line 296: int64_t TmpFileMgr::FileGroup::SetSpaceAllocationLimit(int64_t 
max_size){
missing space before {


http://gerrit.cloudera.org:8080/#/c/3938/1/be/src/runtime/tmp-file-mgr.h
File be/src/runtime/tmp-file-mgr.h:

Line 88:     /// The FileGroup this belongs to. Can be null if it does not 
belong to any group.
I feel like we should make file_group_ a required field to simplify things, 
require it as an argument to CreateFile(), then update the tests in 
tmp-file-mgr-test to use a trivial FileGroup with no limit. I think it will be 
an easy change if you create a FileGroup with no limit in SetUp() and pass it 
to CreateFile().

That way tmp-file-mgr-test is testing something closer to what is used in 
BufferedBlockMgr and we cut down the code slightly.


PS1, Line 89: fileGroup_
file_group_ (not camel case).


Line 110:   /// Represents a group of files. The total allocated bytes of the 
group can be
The wrapping of some of these comments is a little weird - we normally wrap 
closer to 90 chars.


Line 124:     File* GetFileAt(int index);
I think these methods need one-line comments.


PS1, Line 126: Remove
Remove()


Line 129:     int64_t SetSpaceAllocationLimit(int64_t max_size);
If we're not using the return value anywhere, I'd say just make this void. It's 
easy enough to add it back if we actually need it.


PS1, Line 131: bool
why bool?


Line 134: 
re,move extra space.


PS1, Line 141: current_write_size_
I think the comment would actually be a better variable name: 
current_space_allocated_;


Line 144:     int64_t max_write_size_;
max_space_allocated_?


http://gerrit.cloudera.org:8080/#/c/3938/1/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

Line 240:   // otherwise specified either as:
More concise: 

Otherwise specified in the same was as MEM_LIMIT.


-- 
To view, visit http://gerrit.cloudera.org:8080/3938
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie3e9039ad1c0df978ab71b956af33fd997714e73
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bikramjeet Vig <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

Reply via email to