Hi,

> I think your help is mandatory, given the level of voodoo in the five lines 
> you propose :-)

Sure, I can help.

> I did some preliminary tests with the "partial entropy" method … and it seems 
> the algorithm works but it does not get as fast as the content type detection 
> method.

Note you only need to test about 256 bytes, not the whole binary. Sure, the 
more the better.

> Maybe ultimately we could keep both heuristics.

I agree. But not to speed up things: to avoid false positives / negatives. 
Auto-detection is far from perfect.

> Start with the content type detection that would match against MIME types we 
> know for sure are compressed (expected to be a reasonably fixed and short 
> list of MIME types).

I would probably use the following logic:

* list of mime types that are compressed (text/plain and so on)
* list of mime types that should not be compressed (application/zip, 
application/java-archive, and so on)

For the remainder, and if you don't know the mime type, I would use 
auto-detection.

Regards,
Thomas


Reply via email to