Hi, > I think your help is mandatory, given the level of voodoo in the five lines > you propose :-)
Sure, I can help. > I did some preliminary tests with the "partial entropy" method … and it seems > the algorithm works but it does not get as fast as the content type detection > method. Note you only need to test about 256 bytes, not the whole binary. Sure, the more the better. > Maybe ultimately we could keep both heuristics. I agree. But not to speed up things: to avoid false positives / negatives. Auto-detection is far from perfect. > Start with the content type detection that would match against MIME types we > know for sure are compressed (expected to be a reasonably fixed and short > list of MIME types). I would probably use the following logic: * list of mime types that are compressed (text/plain and so on) * list of mime types that should not be compressed (application/zip, application/java-archive, and so on) For the remainder, and if you don't know the mime type, I would use auto-detection. Regards, Thomas