[
https://issues.apache.org/jira/browse/KNOX-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16673643#comment-16673643
]
Kevin Risden commented on KNOX-1530:
------------------------------------
This caused Hive performance through Hive to match what HiveServer2 can send in
HTTP mode. See KNOX-1524 for more details.
> Improve Gzip Compression Handling Performance
> ---------------------------------------------
>
> Key: KNOX-1530
> URL: https://issues.apache.org/jira/browse/KNOX-1530
> Project: Apache Knox
> Issue Type: Improvement
> Reporter: Kevin Risden
> Assignee: Kevin Risden
> Priority: Critical
> Fix For: 1.2.0
>
>
> While looking at KNOX-1524, I found that requesting compressed results can
> cause performance impacts. Knox currently does the following:
> * Apache HttpClient transparently decompresses each request
> ** [Apache HttpClient 4.1 added support for
> this|https://stackoverflow.com/questions/2777076/does-apache-commons-httpclient-support-gzip]
> - HTTPCLIENT-834
> This lead to recompressing some streams (-KNOX-732,- -KNOX-855-, KNOX-856)
> based on MimeTypes. Even if we disableContentCompression, KNOX-565 added the
> following which should only come into play with the above HttpClient
> transparent decompression disabled (or multipart Gzip files - KNOX-1518):
> * Try to decompress the stream
> ** Currently uses try/catch
> * Run any rewrite filter rules
> * If decompressed, recompress the stream
> For many use cases, there is no reason to decompress and recompress the same
> stream. This is because there are no rewrite rules that apply. One example of
> this is Hive where beeline requests compression and HiveServer2 added support
> for returning compressed results with HIVE-17194. Another is with WebHDFS
> where we don't want to change the content going back to the client.
> I am planning to address this in a few pieces:
> * Determine if any rewrite rules apply before decompressing
> ** If rewrite rules apply, then decompress and recompress as before
> ** If rewrite rules do not apply, then copy stream as is
> * Remove gzip filter added by KNOX-732
> ** Figure out if there is another code path where decompress/recompress
> should happen
> ** We should not have to rely on Jetty to recompress content
> * Disable httpclient content compression
> ** Need to make sure we handle decompress/recompress where necessary
> With all 3 improvements in place we should end up with:
> * One place where gzip decompress/recompress happens
> * Only decompress/recompress if rewrite rules match
> * Performance increases due to skipping unnecessary decompress/recompress
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)