> Instead of counting characters over 127 the only test is that the first
> 511 bytes don't contain any of the controll characters 0-8, 14-31. No
> normal textfile would contain these.
>
> Assuming that binary data is random the probability of a incorrectly
> tagged binary would be
>
> ((256-8-18)/256)^511=.00000000000000000000000170726
>
> just testing 127 bits would be a bit to little
>
> ((256-8-18)/256)^127=.00000123868
This is a very interesting idea.
> One of the benefits is that this will correctly tag files in uni-code as
> text as well. Since those control characters never appears in uni-code
> either.
This is a big merit.
Most other multi-byte character set are sure to be designed like that,
I would like to make the 512 a customizable variable too.
$ gtags ... use conventional test
[File gtags.conf]
+----------------------------
|...
| :binarytest_size=512:... ----------------------------------+
| |
v
$ gtags ... use new test using the first n=512 bytes
After testing for a while, we can decide what we should do.
Thank you for your profitable consideration.
--
Shigio YAMAGUCHI <[email protected]>
PGP fingerprint: D1CB 0B89 B346 4AB6 5663 C4B6 3CA5 BBB3 57BE DDA3
_______________________________________________
Bug-global mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/bug-global