Bharath Vissapragada has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/3313

Change subject: IMPALA-3680: Reset the file read offset for failed hdfs cache 
reads
......................................................................

IMPALA-3680: Reset the file read offset for failed hdfs cache reads

Currently we don't reset the file read offset if ZCR fails. Due to
this, when we switch to the normal read path, we hit the eosr of
the scan-range even before reading the expected data length. This
results in re-issuing the whole set of scan ranges and hence a severe
performance hit. This patch just sets the file read offset position
to 0 if the ReadFromCache() call fails.

This was hit as a part of debugging IMPALA-3679, where the queries
on 1gb cached data were running ~20x slower compared to non-cached
runs.

Testing: Its difficult to simulate failed cache reads in tests. So,
I tested this manually by adding additional logging in Read() and
ReadFromCache(). Also the queries mentioned above on 1gb dataset
sped up with performance close to non-cached query runs.

Change-Id: I0a9ea19dd8571b01d2cd5b87da1c259219f6297a
---
M be/src/runtime/disk-io-mgr-scan-range.cc
1 file changed, 7 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/13/3313/1
-- 
To view, visit http://gerrit.cloudera.org:8080/3313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I0a9ea19dd8571b01d2cd5b87da1c259219f6297a
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <[email protected]>

Reply via email to