Bharath Vissapragada has uploaded a new patch set (#2). Change subject: IMPALA-3680: Reset the file read offset for failed hdfs cache reads ......................................................................
IMPALA-3680: Reset the file read offset for failed hdfs cache reads Currently we don't reset the file read offset if ZCR fails. Due to this, when we switch to the normal read path, we hit the eosr of the scan-range even before reading the expected data length. This results in re-issuing the whole set of scan ranges and hence a severe performance hit. This patch just sets the file read offset position to the beginning of scan range, if the ReadFromCache() call fails. This was hit as a part of debugging IMPALA-3679, where the queries on 1gb cached data were running ~20x slower compared to non-cached runs. Testing: Its difficult to simulate failed cache reads in tests. So, I tested this manually by adding additional logging in Read() and ReadFromCache(). Also the queries mentioned above on 1gb dataset sped up with performance close to non-cached query runs. Change-Id: I0a9ea19dd8571b01d2cd5b87da1c259219f6297a --- M be/src/runtime/disk-io-mgr-scan-range.cc 1 file changed, 7 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/13/3313/2 -- To view, visit http://gerrit.cloudera.org:8080/3313 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0a9ea19dd8571b01d2cd5b87da1c259219f6297a Gerrit-PatchSet: 2 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Bharath Vissapragada <[email protected]> Gerrit-Reviewer: Bharath Vissapragada <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]>
