Hi, one issue to remember is that you can only change the compression level per zip-entry. I didn't test too much, but from the javadoc is says:
public void setLevel(int level) Sets the compression level for subsequent entries which are DEFLATED. The default setting is DEFAULT_COMPRESSION. I'm not exactly sure if zip retains the dictionary if you switch compression levels, but I would assume not. i.e. if you have a lot of small text files, interleaved with binaries, then the text files are probably not compressed. which might not be a problem, though. it would be interresting to see some tests that take a typical content asset content package, that has many text files (.content.xml) and few compressed binaries (jpegs). - what is the size difference of the final binary with no compression at all? - what is the size difference of the final binary with interleaved compression? - what are the performance characteristics to unpack/pack the zips? regards, toby On Thu, Mar 9, 2017 at 8:10 PM, Thomas Mueller <[email protected]> wrote: > Hi, > > > I think your help is mandatory, given the level of voodoo in the five > lines you propose :-) > > Sure, I can help. > > > I did some preliminary tests with the "partial entropy" method … and it > seems the algorithm works but it does not get as fast as the content type > detection method. > > Note you only need to test about 256 bytes, not the whole binary. Sure, > the more the better. > > > Maybe ultimately we could keep both heuristics. > > I agree. But not to speed up things: to avoid false positives / negatives. > Auto-detection is far from perfect. > > > Start with the content type detection that would match against MIME > types we know for sure are compressed (expected to be a reasonably fixed > and short list of MIME types). > > I would probably use the following logic: > > * list of mime types that are compressed (text/plain and so on) > * list of mime types that should not be compressed (application/zip, > application/java-archive, and so on) > > For the remainder, and if you don't know the mime type, I would use > auto-detection. > > Regards, > Thomas > > >
