[ 
https://issues.apache.org/jira/browse/NIFI-6964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Pierce updated NIFI-6964:
------------------------------
    Description: 
The CompressContent processor does not use the Compression Level property of 
the processor except for when using the GZIP compression format. On the 
contrary, the xz-lzma2 compression format defaults to using XZ compression 
level 6 for that specific format (I read the CompressContent.java source code 
to verify this) – disregarding whatever compression level you set on the 
processor itself.

As a side note, the xz compression format supports, amazingly enough, 10 levels 
of compression from 0 to 9 – the same as GZIP. The only difference that I can 
tell is level 0 of xz is not the lack of compression, but the lightest 
compression possible (i.e. still some compression) – whereas GZIP compression 
level 0 means just container the content but do not compress.

I have a use case where I must use the xz-lzma2 format (don't ask why) and I 
have to send (using the XZ format) already highly-compressed content that is 
+*NOT*+ XZ format to begin with. I have in excess of 500 GB of this sort of 
already highly compressed content to further compress into the XZ format on a 
daily basis.

The attached patch will enhance the CompressContent.java source code enabling 
the compression level property to be used in both the GZIP and the XZ-LZMA2 
formats.

Please consider adding this patch to the baseline for this processor. I've 
tested it and the results are fantastic because I can crank down the 
compression level to 0 for XZ-LZMA2 now and use a lot less CPU. I'm generally 
seeing a 66% improvement in elapsed time to process highly compressed content 
using XZ format with compression level of 0 versus the hard-coded level 6 of 
the baseline code.

 

  was:
The CompressContent processor does not use the Compression Level property of 
the processor except for when using the GZIP compression format. On the 
contrary, the xz-lzma2 compression format defaults to using XZ compression 
level 6 for that specific format (I read the CompressContent.java source code 
to verify this) – disregarding whatever compression level you set on the 
processor itself.

I have a use case where I must use the xz-lzma2 format (don't ask why) and I 
have to send (using the XZ format) already highly-compressed content that is 
+*NOT*+ XZ format to begin with. I have in excess of 500 GB of this sort of 
already highly compressed content to further compress into the XZ format on a 
daily basis.

The attached patch will enhance the CompressContent.java source code enabling 
the compression level property to be used in both the GZIP and the XZ-LZMA2 
formats.

Please consider adding this patch to the baseline for this processor. I've 
tested it and the results are fantastic because I can crank down the 
compression level to 0 for XZ-LZMA2 now and use a lot less CPU. I'm generally 
seeing a 66% improvement in elapsed time to process highly compressed content 
using XZ format with compression level of 0 versus the hard-coded level 6 of 
the baseline code.

 

         Labels: compression xz-lzma2  (was: xz-lzma2)

> Use compression level for xz-lzma2 format of the CompressContent processor
> --------------------------------------------------------------------------
>
>                 Key: NIFI-6964
>                 URL: https://issues.apache.org/jira/browse/NIFI-6964
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Core Framework
>    Affects Versions: 1.10.0
>            Reporter: John Pierce
>            Priority: Minor
>              Labels: compression, xz-lzma2
>             Fix For: 1.11.0
>
>   Original Estimate: 4h
>          Time Spent: 10m
>  Remaining Estimate: 3h 50m
>
> The CompressContent processor does not use the Compression Level property of 
> the processor except for when using the GZIP compression format. On the 
> contrary, the xz-lzma2 compression format defaults to using XZ compression 
> level 6 for that specific format (I read the CompressContent.java source code 
> to verify this) – disregarding whatever compression level you set on the 
> processor itself.
> As a side note, the xz compression format supports, amazingly enough, 10 
> levels of compression from 0 to 9 – the same as GZIP. The only difference 
> that I can tell is level 0 of xz is not the lack of compression, but the 
> lightest compression possible (i.e. still some compression) – whereas GZIP 
> compression level 0 means just container the content but do not compress.
> I have a use case where I must use the xz-lzma2 format (don't ask why) and I 
> have to send (using the XZ format) already highly-compressed content that is 
> +*NOT*+ XZ format to begin with. I have in excess of 500 GB of this sort of 
> already highly compressed content to further compress into the XZ format on a 
> daily basis.
> The attached patch will enhance the CompressContent.java source code enabling 
> the compression level property to be used in both the GZIP and the XZ-LZMA2 
> formats.
> Please consider adding this patch to the baseline for this processor. I've 
> tested it and the results are fantastic because I can crank down the 
> compression level to 0 for XZ-LZMA2 now and use a lot less CPU. I'm generally 
> seeing a 66% improvement in elapsed time to process highly compressed content 
> using XZ format with compression level of 0 versus the hard-coded level 6 of 
> the baseline code.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to