Dan Hecht has posted comments on this change. Change subject: IMPALA-3680: Reset the file read offset for failed hdfs cache reads ......................................................................
Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/3313/1//COMMIT_MSG Commit Message: PS1, Line 12: re-issuing the whole set of scan ranges > When the ReadFromCache() fails, it increments the internal file read offset Thanks. Can you incorporate this in the commit message or JIRA? Line 23: sped up with performance close to non-cached query runs. I think we need a way to exercise this path in testing. For instance, the bug where 0 should have been offset_ went unnoticed. Is there no way to get things into a state where the metadata thinks the file is cached but it isn't cached (presumably causing hadoopReadZero() to fail)? http://gerrit.cloudera.org:8080/#/c/3313/1/be/src/runtime/disk-io-mgr-scan-range.cc File be/src/runtime/disk-io-mgr-scan-range.cc: Line 432: hdfsSeek(fs_, hdfs_file_->file(), 0); > - Yea sorry, this should be offset_. hdfs seems to be incrementing its inte Regarding Open(), I was wondering why the call to Open() in DiskIoMgr::ReadRange() doesn't do this seek for us, but now I see that Open() returns early if hdfs_file_ != NULL. Rather than do the seek here, maybe we'd be better off calling Close(). That way, we're undoing all possible side effects of Open(), which is called in this function. -- To view, visit http://gerrit.cloudera.org:8080/3313 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0a9ea19dd8571b01d2cd5b87da1c259219f6297a Gerrit-PatchSet: 1 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Bharath Vissapragada <[email protected]> Gerrit-Reviewer: Bharath Vissapragada <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-HasComments: Yes
