[ 
https://issues.apache.org/jira/browse/KNOX-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on KNOX-1530 started by Kevin Risden.
------------------------------------------
> Improve Gzip Compression Handling Performance
> ---------------------------------------------
>
>                 Key: KNOX-1530
>                 URL: https://issues.apache.org/jira/browse/KNOX-1530
>             Project: Apache Knox
>          Issue Type: Improvement
>            Reporter: Kevin Risden
>            Assignee: Kevin Risden
>            Priority: Major
>             Fix For: 1.2.0
>
>
> While looking at KNOX-1524, I found that requesting compressed results can 
> cause performance impacts. Knox currently does the following:
>  * Apache HttpClient transparently decompresses each request
>  ** [Apache HttpClient 4.1 added support for 
> this|https://stackoverflow.com/questions/2777076/does-apache-commons-httpclient-support-gzip]
> This lead to recompressing some streams (KNOX-732 and KNOX-855) based on 
> MimeTypes. Even if we disableContentCompression, KNOX-565 added the following 
> which should only come into play with the above HttpClient transparent 
> decompression disabled (or multipart Gzip files - KNOX-1518):
>  * Try to decompress the stream
>  ** Currently uses try/catch
>  * Run any rewrite filter rules
>  * If decompressed, recompress the stream
> For many use cases, there is no reason to decompress and recompress the same 
> stream. This is because there are no rewrite rules that apply. One example of 
> this is Hive where beeline requests compression and HiveServer2 added support 
> for returning compressed results with HIVE-17194. Another is with WebHDFS 
> where we don't want to change the content going back to the client.
>  
> I am planning to address this in a few pieces:
>  * Determine if any rewrite rules apply before decompressing
>  ** If rewrite rules apply, then decompress and recompress as before
>  ** If rewrite rules do not apply, then copy stream as is
>  * Remove gzip filter added by KNOX-732
>  ** Figure out if there is another code path where decompress/recompress 
> should happen
>  * Disable httpclient content compression
>  ** Need to make sure we handle decompress/recompress where necessary
> With all 3 improvements in place we should end up with:
>  * One place where gzip decompress/recompress happens
>  * Only decompress/recompress if rewrite rules match
>  * Performance increases due to skipping unnecessary decompress/recompress



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to