Tim Armstrong has posted comments on this change. Change subject: IMPALA-3780: avoid many small reads past end of block ......................................................................
Patch Set 3: (4 comments) http://gerrit.cloudera.org:8080/#/c/3518/3/be/src/exec/scanner-context.cc File be/src/exec/scanner-context.cc: Line 162: read_past_buffer_size = ::min(read_past_buffer_size, max_buffer_size); > this is fine, but why is this min() needed now whereas it wasn't before? I changed the contract with the callback a little bit. Before, it was the responsibility of each callback to return a size <= 8MB, otherwise it would hit a DCHECK in Read(). It makes more sense to check this here right before allocating the scan range. Line 262: RETURN_IF_ERROR(boundary_buffer_->EnsureCapacity(requested_len)); > were there cases where requested_len is much larger than the number of byte I don't think so, as far as I've seen most scanners only do large reads when they know the expected size. E.g. a parquet column or a compressed block. This will actually save memory for large reads, since each time we double the buffer we can't immediately free the previous buffer. http://gerrit.cloudera.org:8080/#/c/3518/3/be/src/exec/scanner-context.h File be/src/exec/scanner-context.h: PS3, Line 103: . > ... (see GetNextBuffer()). Done http://gerrit.cloudera.org:8080/#/c/3518/3/be/src/runtime/string-buffer.h File be/src/runtime/string-buffer.h: PS3, Line 99: int > this may conflict with Michael's 64-bit change (changed len's to 64-bits). I'll rebase and then check this. -- To view, visit http://gerrit.cloudera.org:8080/3518 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id90c5dea44f07dba5dd465cf325fbff28be34137 Gerrit-PatchSet: 3 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
