Bharath Vissapragada has posted comments on this change. Change subject: IMPALA-3680: Reset the file read offset for failed hdfs cache reads ......................................................................
Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/3313/1//COMMIT_MSG Commit Message: PS1, Line 12: re-issuing the whole set of scan ranges > what "set" of scan ranges? and where does this happen? When the ReadFromCache() fails, it increments the internal file read offset but doesn't read any data. Then we fallback to ReadRange(), which again fails because we try to read past the file_size (due to incorrect offset). Then we enter [1], where impalad thinks the data is still left to be read and tries to read past scan_range and hence uses DEFAULT_READ_PAST_SIZE(=1KB). This repeatedly starts a large no. of 1KB size scanranges till the whole range is read. [1] https://github.com/cloudera/Impala/blob/cdh5-trunk/be/src/exec/scanner-context.cc#L145 http://gerrit.cloudera.org:8080/#/c/3313/1/be/src/runtime/disk-io-mgr-scan-range.cc File be/src/runtime/disk-io-mgr-scan-range.cc: Line 432: hdfsSeek(fs_, hdfs_file_->file(), 0); > how did this work before, and how does this work? Shouldn't we seek to fil - Yea sorry, this should be offset_. hdfs seems to be incrementing its internal file offset even though the read fails and when we call hdfsReadFile() after fallback, its resuming from that offset again, because we are reusing hdfs_file_->file() object. So per my understanding, this is still required even if we are not calling ScanRange::Open() - Regarding how this worked before, I replied to your comment on the commit message. -- To view, visit http://gerrit.cloudera.org:8080/3313 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0a9ea19dd8571b01d2cd5b87da1c259219f6297a Gerrit-PatchSet: 1 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Bharath Vissapragada <[email protected]> Gerrit-Reviewer: Bharath Vissapragada <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-HasComments: Yes
