Dan Hecht has posted comments on this change.

Change subject: IMPALA-3680: Reset the file read offset for failed hdfs cache 
reads
......................................................................


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/3313/1//COMMIT_MSG
Commit Message:

PS1, Line 12: re-issuing the whole set of scan ranges
> When the ReadFromCache() fails, it increments the internal file read offset
Thanks.  Can you incorporate this in the commit message or JIRA?


Line 23: sped up with performance close to non-cached query runs.
I think we need a way to exercise this path in testing.  For instance, the bug 
where 0 should have been offset_ went unnoticed.  Is there no way to get things 
into a state where the metadata thinks the file is cached but it isn't cached 
(presumably causing hadoopReadZero() to fail)?


http://gerrit.cloudera.org:8080/#/c/3313/1/be/src/runtime/disk-io-mgr-scan-range.cc
File be/src/runtime/disk-io-mgr-scan-range.cc:

Line 432:       hdfsSeek(fs_, hdfs_file_->file(), 0);
> - Yea sorry, this should be offset_. hdfs seems to be incrementing its inte
Regarding Open(), I was wondering why the call to Open() in 
DiskIoMgr::ReadRange() doesn't do this seek for us, but now I see that Open() 
returns early if hdfs_file_ != NULL.

Rather than do the seek here, maybe we'd be better off calling Close().  That 
way, we're undoing all possible side effects of Open(), which is called in this 
function.


-- 
To view, visit http://gerrit.cloudera.org:8080/3313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0a9ea19dd8571b01d2cd5b87da1c259219f6297a
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <[email protected]>
Gerrit-Reviewer: Bharath Vissapragada <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-HasComments: Yes

Reply via email to