[ 
https://issues.apache.org/jira/browse/KNOX-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16656974#comment-16656974
 ] 

Kevin Risden commented on KNOX-1530:
------------------------------------

The guiding principles for this that I've been going off is:
 * Client should dictate if result should be compressed or not through headers
 * Backend should return result compressed or not based on headers from client
 * Knox should not change the result type

 ** If the result was compressed it should be returned to the client compressed
 ** If the result was decompressed it should be returned to the client 
decompressed
 * Knox should not change the result if there are no outbound rewrite rules 
that apply

> Improve Gzip Compression Handling Performance
> ---------------------------------------------
>
>                 Key: KNOX-1530
>                 URL: https://issues.apache.org/jira/browse/KNOX-1530
>             Project: Apache Knox
>          Issue Type: Improvement
>            Reporter: Kevin Risden
>            Assignee: Kevin Risden
>            Priority: Critical
>             Fix For: 1.2.0
>
>
> While looking at KNOX-1524, I found that requesting compressed results can 
> cause performance impacts. Knox currently does the following:
>  * Apache HttpClient transparently decompresses each request
>  ** [Apache HttpClient 4.1 added support for 
> this|https://stackoverflow.com/questions/2777076/does-apache-commons-httpclient-support-gzip]
>  - HTTPCLIENT-834
> This lead to recompressing some streams (-KNOX-732,- -KNOX-855-, KNOX-856) 
> based on MimeTypes. Even if we disableContentCompression, KNOX-565 added the 
> following which should only come into play with the above HttpClient 
> transparent decompression disabled (or multipart Gzip files - KNOX-1518):
>  * Try to decompress the stream
>  ** Currently uses try/catch
>  * Run any rewrite filter rules
>  * If decompressed, recompress the stream
> For many use cases, there is no reason to decompress and recompress the same 
> stream. This is because there are no rewrite rules that apply. One example of 
> this is Hive where beeline requests compression and HiveServer2 added support 
> for returning compressed results with HIVE-17194. Another is with WebHDFS 
> where we don't want to change the content going back to the client.
> I am planning to address this in a few pieces:
>  * Determine if any rewrite rules apply before decompressing
>  ** If rewrite rules apply, then decompress and recompress as before
>  ** If rewrite rules do not apply, then copy stream as is
>  * Remove gzip filter added by KNOX-732
>  ** Figure out if there is another code path where decompress/recompress 
> should happen
>  ** We should not have to rely on Jetty to recompress content
>  * Disable httpclient content compression
>  ** Need to make sure we handle decompress/recompress where necessary
> With all 3 improvements in place we should end up with:
>  * One place where gzip decompress/recompress happens
>  * Only decompress/recompress if rewrite rules match
>  * Performance increases due to skipping unnecessary decompress/recompress



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to