wgtmac opened a new pull request #695: URL: https://github.com/apache/orc/pull/695
The current implementation of ZlibDecompressionStream::seek and BlockDecompressionStream::seek resets the state of the decompressor and the underlying file reader and throws away their buffers. This commit introduces two optimizations which rely on reusing the buffers that still contain useful data, and therefore reducing the time spent reading/uncompressing the buffers again. The first case is when the seeked position is already read and decompressed into the output stream. The second case is when the seeked position is already read from the input stream, but has not been decompressed yet, ie. it's not in the output stream. Tests: - Run the ORC tests, and the Impala tests working on ORC tables. - The regression that #476 would cause is not present anymore. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
