[
https://issues.apache.org/jira/browse/KNOX-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kevin Risden updated KNOX-1530:
-------------------------------
Description:
While looking at KNOX-1524, I found that requesting compressed results can
cause performance impacts. Knox currently does the following:
* Apache HttpClient transparently decompresses each request
** [Apache HttpClient 4.1 added support for
this|https://stackoverflow.com/questions/2777076/does-apache-commons-httpclient-support-gzip]
- HTTPCLIENT-834
This lead to recompressing some streams (-KNOX-732,- -KNOX-855-, KNOX-856)
based on MimeTypes. Even if we disableContentCompression, KNOX-565 added the
following which should only come into play with the above HttpClient
transparent decompression disabled (or multipart Gzip files - KNOX-1518):
* Try to decompress the stream
** Currently uses try/catch
* Run any rewrite filter rules
* If decompressed, recompress the stream
For many use cases, there is no reason to decompress and recompress the same
stream. This is because there are no rewrite rules that apply. One example of
this is Hive where beeline requests compression and HiveServer2 added support
for returning compressed results with HIVE-17194. Another is with WebHDFS where
we don't want to change the content going back to the client.
I am planning to address this in a few pieces:
* Determine if any rewrite rules apply before decompressing
** If rewrite rules apply, then decompress and recompress as before
** If rewrite rules do not apply, then copy stream as is
* Remove gzip filter added by KNOX-732
** Figure out if there is another code path where decompress/recompress should
happen
** We should not have to rely on Jetty to recompress content
* Disable httpclient content compression
** Need to make sure we handle decompress/recompress where necessary
With all 3 improvements in place we should end up with:
* One place where gzip decompress/recompress happens
* Only decompress/recompress if rewrite rules match
* Performance increases due to skipping unnecessary decompress/recompress
was:
While looking at KNOX-1524, I found that requesting compressed results can
cause performance impacts. Knox currently does the following:
* Apache HttpClient transparently decompresses each request
** [Apache HttpClient 4.1 added support for
this|https://stackoverflow.com/questions/2777076/does-apache-commons-httpclient-support-gzip]
- HTTPCLIENT-834
This lead to recompressing some streams (KNOX-732 and KNOX-855) based on
MimeTypes. Even if we disableContentCompression, KNOX-565 added the following
which should only come into play with the above HttpClient transparent
decompression disabled (or multipart Gzip files - KNOX-1518):
* Try to decompress the stream
** Currently uses try/catch
* Run any rewrite filter rules
* If decompressed, recompress the stream
For many use cases, there is no reason to decompress and recompress the same
stream. This is because there are no rewrite rules that apply. One example of
this is Hive where beeline requests compression and HiveServer2 added support
for returning compressed results with HIVE-17194. Another is with WebHDFS where
we don't want to change the content going back to the client.
I am planning to address this in a few pieces:
* Determine if any rewrite rules apply before decompressing
** If rewrite rules apply, then decompress and recompress as before
** If rewrite rules do not apply, then copy stream as is
* Remove gzip filter added by KNOX-732
** Figure out if there is another code path where decompress/recompress should
happen
** We should not have to rely on Jetty to recompress content
* Disable httpclient content compression
** Need to make sure we handle decompress/recompress where necessary
With all 3 improvements in place we should end up with:
* One place where gzip decompress/recompress happens
* Only decompress/recompress if rewrite rules match
* Performance increases due to skipping unnecessary decompress/recompress
> Improve Gzip Compression Handling Performance
> ---------------------------------------------
>
> Key: KNOX-1530
> URL: https://issues.apache.org/jira/browse/KNOX-1530
> Project: Apache Knox
> Issue Type: Improvement
> Reporter: Kevin Risden
> Assignee: Kevin Risden
> Priority: Critical
> Fix For: 1.2.0
>
>
> While looking at KNOX-1524, I found that requesting compressed results can
> cause performance impacts. Knox currently does the following:
> * Apache HttpClient transparently decompresses each request
> ** [Apache HttpClient 4.1 added support for
> this|https://stackoverflow.com/questions/2777076/does-apache-commons-httpclient-support-gzip]
> - HTTPCLIENT-834
> This lead to recompressing some streams (-KNOX-732,- -KNOX-855-, KNOX-856)
> based on MimeTypes. Even if we disableContentCompression, KNOX-565 added the
> following which should only come into play with the above HttpClient
> transparent decompression disabled (or multipart Gzip files - KNOX-1518):
> * Try to decompress the stream
> ** Currently uses try/catch
> * Run any rewrite filter rules
> * If decompressed, recompress the stream
> For many use cases, there is no reason to decompress and recompress the same
> stream. This is because there are no rewrite rules that apply. One example of
> this is Hive where beeline requests compression and HiveServer2 added support
> for returning compressed results with HIVE-17194. Another is with WebHDFS
> where we don't want to change the content going back to the client.
> I am planning to address this in a few pieces:
> * Determine if any rewrite rules apply before decompressing
> ** If rewrite rules apply, then decompress and recompress as before
> ** If rewrite rules do not apply, then copy stream as is
> * Remove gzip filter added by KNOX-732
> ** Figure out if there is another code path where decompress/recompress
> should happen
> ** We should not have to rely on Jetty to recompress content
> * Disable httpclient content compression
> ** Need to make sure we handle decompress/recompress where necessary
> With all 3 improvements in place we should end up with:
> * One place where gzip decompress/recompress happens
> * Only decompress/recompress if rewrite rules match
> * Performance increases due to skipping unnecessary decompress/recompress
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)