Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3936: BufferedBlockMgr fixes for Pin() while write in 
flight.
......................................................................


Patch Set 2:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/3832/2//COMMIT_MSG
Commit Message:

Line 14:   the in-flight write to complete before transferring the buffer.
> TransferBlock() will either unpin or delete.  In the delete case, WriteComp
The invariant is !in_write_ || buffer_desc_ != NULL

The problem is in the delete case if src->in_write_ is true, then it will grab 
the buffer_desc_ from src, leaving it in the invalid state where in_write_ is 
true and buffer_desc_ == NULL.

Your comment makes me think that there may be a similar problem in the unpin 
case that is also solved by this fix. I don't think we ever unpin a block twice 
so we wouldn't have hit it though.


Line 17:   woken up. The fix is to clarify when condition variables will be
> was this manifesting in a hang?
Yep the unit test hung in many cases and crashed in others. I didn't see that 
in Impala.


PS2, Line 19: right
> write
Done


Line 20:   notify_all() in case there are multiple threads waiting.
> is using notify_all() necessary, or just done for consistency?
It isn't really necessary in some cases now that I've reasoned through it. I 
reverted those changes and updated the header comments to better explain the 
invariants.


Line 28: It succeeded.
> is this a configuration that will get coverage at some point during release
Hard to say - I only saw it in a specific configuration I was running locally 
on a specific scale factor with a release build and the PHJ refactor patch. We 
haven't seen it in other stress tests. Maybe with the PHJ refactor patch we'll 
see it on other stress tests.

I'm pretty happy with the coverage that the unit test provides.


http://gerrit.cloudera.org:8080/#/c/3832/2/be/src/runtime/buffered-block-mgr.cc
File be/src/runtime/buffered-block-mgr.cc:

Line 929:     block->write_complete_cv_.notify_all();
> seems okay to use notify_all(), but in practice why would there be multiple
I reverted this change and updated the header comment because, as you pointed 
out, it isn't possible.


Line 976:         buffer_available_cv_.notify_all();
I'm more convinced now that this change was unnecessary. I was thinking that it 
would be a problem if a waiter woke up then decided not to grab the buffer, but 
that shouldn't be possible. 

I reverted this change and updated the header comment.


http://gerrit.cloudera.org:8080/#/c/3832/2/be/src/runtime/buffered-block-mgr.h
File be/src/runtime/buffered-block-mgr.h:

Line 449:   /// needs to not have been taken when this function is called.
> looks like this comment needs updating if we'll always wait for in-flight w
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/3832
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4be4fad8e6f2303db19ea1e2bd0f13523781ae8e
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Matthew Jacobs <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

Reply via email to