Hi Thomas, 2017-03-07 11:27 GMT+01:00 Thomas Mueller <[email protected]>:
> Hi, > > > As for configuration: What is the reason for having a configuration > option ? > > Detecting if data is compressible can be done with low overhead, without > having to look at the content type, and without having to use configuration > options: > > http://stackoverflow.com/questions/7027022/how-to- > efficiently-predict-if-data-is-compressible > > Sample code is available in one of the answers ("I implemented a few > methods to test if data is compressible…"). It is quite simple, and only > needs to process 256 bytes. Both the "Partial Entropy" and the "Simplified > Compression" work relatively well. > > This is not designed to be a "perfect" solution for the problem. It's a > low-overhead heuristic, that will reduce the compression overhead on the > average. > This sounds very nice :-) we could indeed drop the list of MIME type configuration. IMO we should still allow to tweak between best performance and best compression though, in order to accommodate different use cases. I thought about covering the two aspects in JCRVLT-163, but now changed the focus of JCRVLT-163 on avoiding compressing binaries (with or without auto-detection) and created JCRVLT-164 for allowing to tweak the default compression level. Regards, Timothee > > Regards, > Thomas > > > > > Am 06.03.2017 um 16:43 schrieb Timothée Maret <[email protected]>: > > Hi, > > With Sling content distribution (using FileVault), we observe a > significantly lower throughput for content packages containing binaries. > The main bottleneck seems to be the compression algorithm applied to every > element contained in the content package. > > I think that we could improve the throughput significantly, simply by > avoiding to re-compress binaries that are already compressed. > In order to figure out what binaries are already compressed, we could use > match the content type stored along the binary against a list of > configurable content types. > > I have done some micro tests with this idea (patch in [0]). I think that > the results are promising. > > Exporting a single 250 MB JPEG is 80% faster (22.4 sec -> 4.3 sec) for a > 3% bigger content package (233.2 MB -> 240.4 MB) > Exporting AEM OOTB /content/dam is 50% faster (11.9 sec -> 5.9 sec) for a > 5% bigger content package (92.8 MB -> 97.4 MB) > Import for the same cases is 66% faster respectively 32% faster. > > I think this could either be done by default and allowing to configure the > list of types that skip compression. > Alternatively, it could be done on a project level, by extending FileVault > with the following > > 1. For each package, allow to define the default compression level (best > compression, best speed) > 2. Expose an API that allow to plugin a custom logic to decide how to > compress a given artefact > > In any case, the changes would be backward compatible. Content packages > created with the new code would be installable on instances running the old > code and vice versa. > > wdyt ? > > Regards, > > Timothee > > > [0] https://github.com/tmaret/jackrabbit-filevault/tree/ > performance-avoid-compressing-already-compressed-binaries- > based-on-content-type-detection > [1] https://docs.oracle.com/javase/7/docs/api/java/util/ > zip/Deflater.html#BEST_SPEED > > > > -- Timothée Maret
