Tim Armstrong has posted comments on this change. Change subject: IMPALA-3936: BufferedBlockMgr fixes for Pin() while write in flight. ......................................................................
Patch Set 2: (8 comments) http://gerrit.cloudera.org:8080/#/c/3832/2//COMMIT_MSG Commit Message: Line 14: the in-flight write to complete before transferring the buffer. > TransferBlock() will either unpin or delete. In the delete case, WriteComp The invariant is !in_write_ || buffer_desc_ != NULL The problem is in the delete case if src->in_write_ is true, then it will grab the buffer_desc_ from src, leaving it in the invalid state where in_write_ is true and buffer_desc_ == NULL. Your comment makes me think that there may be a similar problem in the unpin case that is also solved by this fix. I don't think we ever unpin a block twice so we wouldn't have hit it though. Line 17: woken up. The fix is to clarify when condition variables will be > was this manifesting in a hang? Yep the unit test hung in many cases and crashed in others. I didn't see that in Impala. PS2, Line 19: right > write Done Line 20: notify_all() in case there are multiple threads waiting. > is using notify_all() necessary, or just done for consistency? It isn't really necessary in some cases now that I've reasoned through it. I reverted those changes and updated the header comments to better explain the invariants. Line 28: It succeeded. > is this a configuration that will get coverage at some point during release Hard to say - I only saw it in a specific configuration I was running locally on a specific scale factor with a release build and the PHJ refactor patch. We haven't seen it in other stress tests. Maybe with the PHJ refactor patch we'll see it on other stress tests. I'm pretty happy with the coverage that the unit test provides. http://gerrit.cloudera.org:8080/#/c/3832/2/be/src/runtime/buffered-block-mgr.cc File be/src/runtime/buffered-block-mgr.cc: Line 929: block->write_complete_cv_.notify_all(); > seems okay to use notify_all(), but in practice why would there be multiple I reverted this change and updated the header comment because, as you pointed out, it isn't possible. Line 976: buffer_available_cv_.notify_all(); I'm more convinced now that this change was unnecessary. I was thinking that it would be a problem if a waiter woke up then decided not to grab the buffer, but that shouldn't be possible. I reverted this change and updated the header comment. http://gerrit.cloudera.org:8080/#/c/3832/2/be/src/runtime/buffered-block-mgr.h File be/src/runtime/buffered-block-mgr.h: Line 449: /// needs to not have been taken when this function is called. > looks like this comment needs updating if we'll always wait for in-flight w Done -- To view, visit http://gerrit.cloudera.org:8080/3832 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4be4fad8e6f2303db19ea1e2bd0f13523781ae8e Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Matthew Jacobs <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
