On Fri, Oct 03, 2014 at 11:26:32AM +0400, Navrotskiy Artem wrote:
>    Hello,
> 
> 
> 
>    Subversion console client try to detect binary file with algorythm:
> 
>     1. File is NOT BINARY if it contains only BOM UTF-8 signature (why not
>        check as first N bytes is corret UTF-8?);
>     2. File is BINARY if first 1024 bytes contains ZERO byte (uniform
>        distribution of bytes takes change of absent ZERO byte: (1 - 1 /
>        256) ^ 1024 = ~1.8%);
>     3. File is BINARY if first 1024 bytes contains over 85% of characters
>        not in range 0x07-0x0D, 0x20-0x7F (total we have 153 "binary"
>        bytes, ~60%).
> 
>    This algoritm looks like broken.
> 

Can you suggest a better algoritm?

Reply via email to