[ 
https://issues.apache.org/jira/browse/JCRVLT-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16320832#comment-16320832
 ] 

Dirk Rudolph edited comment on JCRVLT-259 at 1/10/18 7:00 PM:
--------------------------------------------------------------

I have a first draft ready here: 

https://github.com/Buuhuu/jackrabbit-filevault/tree/feature/JCRVLT-259-usingcommonscompress

As a first step I removed the usage of java's ZIP and JAR apis from 
PackageManagerImpl and JcrPackageManagerImpl, replacing it with 
commons-compress APIs. Those already introduce an abstraction so that its 
possible to create any kind of archive (ZipArchive, TarArchive)

https://github.com/Buuhuu/jackrabbit-filevault/commit/76e12ab004f35b13848f0a80a69fd4343f41b709
https://github.com/Buuhuu/jackrabbit-filevault/commit/51a4e334c891d55889d79898be019b43e82b26d0
https://github.com/Buuhuu/jackrabbit-filevault/commit/04cb0bcdc7b4d645bc97c98585e5044ccded7267

Then I moved the zip independent logic of ZipStreamArchive to 
AbstractStreamArchive and implemented a TarStreamArchive. ZipArchive stays 
untouched, as for TarArchive a TarStreamArchive can be used based on the file's 
InputStream. This is because the entry's headers are not at the end of the 
stream for tar and can therefor be read in sequence without any performance 
impact. 

https://github.com/Buuhuu/jackrabbit-filevault/commit/98c8b27cd9ba74439f6d3893bae36a6b709a7f9a
https://github.com/Buuhuu/jackrabbit-filevault/commit/ec74b88f786ef0946e25cade312f366a252e22ce

Last but not least I did the same abstraction on JarExporter implementing a 
TarExpoter and finally adding the compression method to use to the 
ExportOptions. 

https://github.com/Buuhuu/jackrabbit-filevault/commit/20b0243fcdc4e7b83cf9779df72dbf2c0ef4f1e8
https://github.com/Buuhuu/jackrabbit-filevault/commit/d39bcd181098a6e8a5b214bc3d03235bade9d047
https://github.com/Buuhuu/jackrabbit-filevault/commit/e15eae5a8782eef5b63b7bdaf1790675849156fe

Basically the biggest change is, that there is now a mandatory dependency to 
commons-compress which is needed to support tar. Tar acts as container for any 
compression method different to deflate (the default one). I also added an 
optional dependency for snappy-java (supporting jni for performance reasons but 
falling back to commons-compress' implementation). 

Another kind of ugly thing that was necessary is the logic in 
TarExporter#createEntry(). Unfortunately (same as for zips stored method) tar 
requires the entry's size to be known. As there is a method that accepts a 
plain InputStream, and as the VaultFile#length() might return null, I had to 
implement a mechanism that counts the size of the binary. Doing so I write the 
InputStream to memory, switching to a file when a certain size has been 
exceeded. Afterwards instead of spooling the original binary to the file, I use 
the memory/file copy.

In the ExportOptions, there is now a setter for the compression method, which 
can be any of the compression methods supported by commos-compress. 
Additionally there is an AutodetectArchive and AutodetectStreamArchive which 
detects the compression method and creates a TarArchive or in case its a zip a 
ZipArchive as it was before. The code is only a draft and a couple of things 
need to be renamed and/or moved around but first I want to ask for some 
feedback. Anyway the unit tests are green and I added also some for snappy in 
TestBinarylessExport and TestMappedExport.

[~marett] do you have the performance benchmark you used for Sling Content 
Distribution mentioned here 
[SLING-7362|https://issues.apache.org/jira/browse/SLING-7362?focusedCommentId=16318142&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16318142]
 still available and if so would you be so kind to check if there is a gain 
with snappy as compression method?


was (Author: diru):
I have a first draft ready here: 

https://github.com/Buuhuu/jackrabbit-filevault/tree/feature/JCRVLT-259-usingcommonscompress

As a first step I removed the usage of java's ZIP and JAR apis from 
PackageManagerImpl and JcrPackageManagerImpl, replacing it with 
commons-compress APIs. Those already introduce an abstraction so that its 
possible to create any kind of archive (ZipArchive, TarArchive)

https://github.com/Buuhuu/jackrabbit-filevault/commit/76e12ab004f35b13848f0a80a69fd4343f41b709
https://github.com/Buuhuu/jackrabbit-filevault/commit/51a4e334c891d55889d79898be019b43e82b26d0
https://github.com/Buuhuu/jackrabbit-filevault/commit/04cb0bcdc7b4d645bc97c98585e5044ccded7267

Then I moved the zip independent logic of ZipStreamArchive to 
AbstractStreamArchive and implemented a TarStreamArchive. ZipArchive stays 
untouched, as for TarArchive a TarStreamArchive can be used based on the file's 
InputStream. This is because the entry's headers are not at the end of the 
stream for tar and can therefor be read in sequence without any performance 
impact. 

https://github.com/Buuhuu/jackrabbit-filevault/commit/98c8b27cd9ba74439f6d3893bae36a6b709a7f9a
https://github.com/Buuhuu/jackrabbit-filevault/commit/ec74b88f786ef0946e25cade312f366a252e22ce

Last but not least I did the same abstraction on JarExporter implementing a 
TarExpoter and finally adding the compression method to use to the 
ExportOptions. 

https://github.com/Buuhuu/jackrabbit-filevault/commit/20b0243fcdc4e7b83cf9779df72dbf2c0ef4f1e8
https://github.com/Buuhuu/jackrabbit-filevault/commit/d39bcd181098a6e8a5b214bc3d03235bade9d047
https://github.com/Buuhuu/jackrabbit-filevault/commit/e15eae5a8782eef5b63b7bdaf1790675849156fe

Basically the biggest change is, that there is now a mandatory dependency to 
commons-compress which is needed to support tar. Tar acts as container for any 
compression method different to deflate (the default one). I also added an 
optional dependency for snappy-java (supporting jni for performance reasons but 
falling back to commons-compress' implementation).

In the ExportOptions, there is now a setter for the compression method, which 
can be any of the compression methods supported by commos-compress. 
Additionally there is an AutodetectArchive and AutodetectStreamArchive which 
detects the compression method and creates a TarArchive or in case its a zip a 
ZipArchive as it was before. The code is only a draft and a couple of things 
need to be renamed and/or moved around but first I want to ask for some 
feedback. Anyway the unit tests are green and I added also some for snappy in 
TestBinarylessExport and TestMappedExport.

[~marett] do you have the performance benchmark you used for Sling Content 
Distribution mentioned here 
[SLING-7362|https://issues.apache.org/jira/browse/SLING-7362?focusedCommentId=16318142&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16318142]
 still available and if so would you be so kind to check if there is a gain 
with snappy as compression method?

> Support pluggable compression formats
> -------------------------------------
>
>                 Key: JCRVLT-259
>                 URL: https://issues.apache.org/jira/browse/JCRVLT-259
>             Project: Jackrabbit FileVault
>          Issue Type: New Feature
>          Components: Packaging
>            Reporter: Dirk Rudolph
>
> The new feature to suppress compression for already compressed binaries 
> introduced in JCRVLT-163, has been implemented with Apache Sling Content 
> Distribution in mind to get best performance while still keep bytes on the 
> wire as small as affordable [0]. 
> Wouldn't it make sense for such cases to support different compression 
> formats as well, especially thinking about snappy or lz4?
> Currently there is a pair for compressing and decompressing implementations, 
> JarExporter and ZipArchive/ZipStreamArchive. Those are used by the 
> PackageManagerImpl for example to open or assemble a new package. Adding 
> support for further compression formats could be done for example by 
> implementing a registry holding the constructors for compression and 
> decompression implementations, where the one of choice could be requested by 
> the ExportOptions (defaulting to jar/zip as it is now).
> [0] 
> https://github.com/apache/sling-org-apache-sling-distribution-core/blob/master/src/main/java/org/apache/sling/distribution/serialization/impl/vlt/VltUtils.java#L190



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to