[ https://issues.apache.org/jira/browse/JCRVLT-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16320832#comment-16320832 ]
Dirk Rudolph edited comment on JCRVLT-259 at 1/10/18 7:00 PM: -------------------------------------------------------------- I have a first draft ready here: https://github.com/Buuhuu/jackrabbit-filevault/tree/feature/JCRVLT-259-usingcommonscompress As a first step I removed the usage of java's ZIP and JAR apis from PackageManagerImpl and JcrPackageManagerImpl, replacing it with commons-compress APIs. Those already introduce an abstraction so that its possible to create any kind of archive (ZipArchive, TarArchive) https://github.com/Buuhuu/jackrabbit-filevault/commit/76e12ab004f35b13848f0a80a69fd4343f41b709 https://github.com/Buuhuu/jackrabbit-filevault/commit/51a4e334c891d55889d79898be019b43e82b26d0 https://github.com/Buuhuu/jackrabbit-filevault/commit/04cb0bcdc7b4d645bc97c98585e5044ccded7267 Then I moved the zip independent logic of ZipStreamArchive to AbstractStreamArchive and implemented a TarStreamArchive. ZipArchive stays untouched, as for TarArchive a TarStreamArchive can be used based on the file's InputStream. This is because the entry's headers are not at the end of the stream for tar and can therefor be read in sequence without any performance impact. https://github.com/Buuhuu/jackrabbit-filevault/commit/98c8b27cd9ba74439f6d3893bae36a6b709a7f9a https://github.com/Buuhuu/jackrabbit-filevault/commit/ec74b88f786ef0946e25cade312f366a252e22ce Last but not least I did the same abstraction on JarExporter implementing a TarExpoter and finally adding the compression method to use to the ExportOptions. https://github.com/Buuhuu/jackrabbit-filevault/commit/20b0243fcdc4e7b83cf9779df72dbf2c0ef4f1e8 https://github.com/Buuhuu/jackrabbit-filevault/commit/d39bcd181098a6e8a5b214bc3d03235bade9d047 https://github.com/Buuhuu/jackrabbit-filevault/commit/e15eae5a8782eef5b63b7bdaf1790675849156fe Basically the biggest change is, that there is now a mandatory dependency to commons-compress which is needed to support tar. Tar acts as container for any compression method different to deflate (the default one). I also added an optional dependency for snappy-java (supporting jni for performance reasons but falling back to commons-compress' implementation). Another kind of ugly thing that was necessary is the logic in TarExporter#createEntry(). Unfortunately (same as for zips stored method) tar requires the entry's size to be known. As there is a method that accepts a plain InputStream, and as the VaultFile#length() might return null, I had to implement a mechanism that counts the size of the binary. Doing so I write the InputStream to memory, switching to a file when a certain size has been exceeded. Afterwards instead of spooling the original binary to the file, I use the memory/file copy. In the ExportOptions, there is now a setter for the compression method, which can be any of the compression methods supported by commos-compress. Additionally there is an AutodetectArchive and AutodetectStreamArchive which detects the compression method and creates a TarArchive or in case its a zip a ZipArchive as it was before. The code is only a draft and a couple of things need to be renamed and/or moved around but first I want to ask for some feedback. Anyway the unit tests are green and I added also some for snappy in TestBinarylessExport and TestMappedExport. [~marett] do you have the performance benchmark you used for Sling Content Distribution mentioned here [SLING-7362|https://issues.apache.org/jira/browse/SLING-7362?focusedCommentId=16318142&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16318142] still available and if so would you be so kind to check if there is a gain with snappy as compression method? was (Author: diru): I have a first draft ready here: https://github.com/Buuhuu/jackrabbit-filevault/tree/feature/JCRVLT-259-usingcommonscompress As a first step I removed the usage of java's ZIP and JAR apis from PackageManagerImpl and JcrPackageManagerImpl, replacing it with commons-compress APIs. Those already introduce an abstraction so that its possible to create any kind of archive (ZipArchive, TarArchive) https://github.com/Buuhuu/jackrabbit-filevault/commit/76e12ab004f35b13848f0a80a69fd4343f41b709 https://github.com/Buuhuu/jackrabbit-filevault/commit/51a4e334c891d55889d79898be019b43e82b26d0 https://github.com/Buuhuu/jackrabbit-filevault/commit/04cb0bcdc7b4d645bc97c98585e5044ccded7267 Then I moved the zip independent logic of ZipStreamArchive to AbstractStreamArchive and implemented a TarStreamArchive. ZipArchive stays untouched, as for TarArchive a TarStreamArchive can be used based on the file's InputStream. This is because the entry's headers are not at the end of the stream for tar and can therefor be read in sequence without any performance impact. https://github.com/Buuhuu/jackrabbit-filevault/commit/98c8b27cd9ba74439f6d3893bae36a6b709a7f9a https://github.com/Buuhuu/jackrabbit-filevault/commit/ec74b88f786ef0946e25cade312f366a252e22ce Last but not least I did the same abstraction on JarExporter implementing a TarExpoter and finally adding the compression method to use to the ExportOptions. https://github.com/Buuhuu/jackrabbit-filevault/commit/20b0243fcdc4e7b83cf9779df72dbf2c0ef4f1e8 https://github.com/Buuhuu/jackrabbit-filevault/commit/d39bcd181098a6e8a5b214bc3d03235bade9d047 https://github.com/Buuhuu/jackrabbit-filevault/commit/e15eae5a8782eef5b63b7bdaf1790675849156fe Basically the biggest change is, that there is now a mandatory dependency to commons-compress which is needed to support tar. Tar acts as container for any compression method different to deflate (the default one). I also added an optional dependency for snappy-java (supporting jni for performance reasons but falling back to commons-compress' implementation). In the ExportOptions, there is now a setter for the compression method, which can be any of the compression methods supported by commos-compress. Additionally there is an AutodetectArchive and AutodetectStreamArchive which detects the compression method and creates a TarArchive or in case its a zip a ZipArchive as it was before. The code is only a draft and a couple of things need to be renamed and/or moved around but first I want to ask for some feedback. Anyway the unit tests are green and I added also some for snappy in TestBinarylessExport and TestMappedExport. [~marett] do you have the performance benchmark you used for Sling Content Distribution mentioned here [SLING-7362|https://issues.apache.org/jira/browse/SLING-7362?focusedCommentId=16318142&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16318142] still available and if so would you be so kind to check if there is a gain with snappy as compression method? > Support pluggable compression formats > ------------------------------------- > > Key: JCRVLT-259 > URL: https://issues.apache.org/jira/browse/JCRVLT-259 > Project: Jackrabbit FileVault > Issue Type: New Feature > Components: Packaging > Reporter: Dirk Rudolph > > The new feature to suppress compression for already compressed binaries > introduced in JCRVLT-163, has been implemented with Apache Sling Content > Distribution in mind to get best performance while still keep bytes on the > wire as small as affordable [0]. > Wouldn't it make sense for such cases to support different compression > formats as well, especially thinking about snappy or lz4? > Currently there is a pair for compressing and decompressing implementations, > JarExporter and ZipArchive/ZipStreamArchive. Those are used by the > PackageManagerImpl for example to open or assemble a new package. Adding > support for further compression formats could be done for example by > implementing a registry holding the constructors for compression and > decompression implementations, where the one of choice could be requested by > the ExportOptions (defaulting to jar/zip as it is now). > [0] > https://github.com/apache/sling-org-apache-sling-distribution-core/blob/master/src/main/java/org/apache/sling/distribution/serialization/impl/vlt/VltUtils.java#L190 -- This message was sent by Atlassian JIRA (v6.4.14#64029)