[
https://issues.apache.org/jira/browse/KNOX-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16656938#comment-16656938
]
Kevin Risden commented on KNOX-1530:
------------------------------------
Note: We won't see performance improvements until both HttpClient transparent
decompress is disabled AND we only decompress when there are rewrite rules.
Otherwise the decompression is already happening.
> Improve Gzip Compression Handling Performance
> ---------------------------------------------
>
> Key: KNOX-1530
> URL: https://issues.apache.org/jira/browse/KNOX-1530
> Project: Apache Knox
> Issue Type: Improvement
> Reporter: Kevin Risden
> Assignee: Kevin Risden
> Priority: Major
> Fix For: 1.2.0
>
>
> While looking at KNOX-1524, I found that requesting compressed results can
> cause performance impacts. Knox currently does the following:
> * Apache HttpClient transparently decompresses each request
> ** [Apache HttpClient 4.1 added support for
> this|https://stackoverflow.com/questions/2777076/does-apache-commons-httpclient-support-gzip]
> - HTTPCLIENT-834
> This lead to recompressing some streams (KNOX-732 and KNOX-855) based on
> MimeTypes. Even if we disableContentCompression, KNOX-565 added the following
> which should only come into play with the above HttpClient transparent
> decompression disabled (or multipart Gzip files - KNOX-1518):
> * Try to decompress the stream
> ** Currently uses try/catch
> * Run any rewrite filter rules
> * If decompressed, recompress the stream
> For many use cases, there is no reason to decompress and recompress the same
> stream. This is because there are no rewrite rules that apply. One example of
> this is Hive where beeline requests compression and HiveServer2 added support
> for returning compressed results with HIVE-17194. Another is with WebHDFS
> where we don't want to change the content going back to the client.
> I am planning to address this in a few pieces:
> * Determine if any rewrite rules apply before decompressing
> ** If rewrite rules apply, then decompress and recompress as before
> ** If rewrite rules do not apply, then copy stream as is
> * Remove gzip filter added by KNOX-732
> ** Figure out if there is another code path where decompress/recompress
> should happen
> * Disable httpclient content compression
> ** Need to make sure we handle decompress/recompress where necessary
> With all 3 improvements in place we should end up with:
> * One place where gzip decompress/recompress happens
> * Only decompress/recompress if rewrite rules match
> * Performance increases due to skipping unnecessary decompress/recompress
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)