Hi. > I would like to make the 512 a customizable variable too. I examined the performance using attached test program. And I confirmed that execute time did not have a significant difference at size smaller than 512. I think that 512 is an appropriate value.
$ foreach a ( 32 64 128 256 512 1024 2048 4096 8192 ) foreach? rm -fr linux-2.6.31; tar xfj ~/download/linux/linux-2.6.31.tar.bz2; sync foreach? time sh -c 'find linux-2.6.31 -type f | test_isbinary '$a' > /dev/null' foreach? end 0.060u 0.348s 0:00.30 133.3% 0+0k 0+0io 0pf+0w 0.048u 0.344s 0:00.31 122.5% 0+0k 0+0io 0pf+0w 0.088u 0.340s 0:00.32 131.2% 0+0k 0+0io 0pf+0w 0.076u 0.364s 0:00.32 134.3% 0+0k 0+0io 0pf+0w 0.084u 0.372s 0:00.34 132.3% 0+0k 0+0io 0pf+0w 0.112u 0.368s 0:00.37 127.0% 0+0k 0+0io 0pf+0w 0.152u 0.368s 0:00.42 121.4% 0+0k 0+0io 0pf+0w 0.260u 0.368s 0:00.51 121.5% 0+0k 0+0io 0pf+0w 0.388u 0.368s 0:00.75 98.6% 0+0k 0+0io 0pf+0w On Sat, 21 Nov 2009 15:42:11 +0900, Shigio YAMAGUCHI wrote... > > Instead of counting characters over 127 the only test is that the first > > 511 bytes don't contain any of the controll characters 0-8, 14-31. No > > normal textfile would contain these. > > > > Assuming that binary data is random the probability of a incorrectly > > tagged binary would be > > > > ((256-8-18)/256)^511=.00000000000000000000000170726 > > > > just testing 127 bits would be a bit to little > > > > ((256-8-18)/256)^127=.00000123868 > > This is a very interesting idea. > > > One of the benefits is that this will correctly tag files in uni-code as > > text as well. Since those control characters never appears in uni-code > > either. > > This is a big merit. > Most other multi-byte character set are sure to be designed like that, > > I would like to make the 512 a customizable variable too. > > $ gtags ... use conventional test > > [File gtags.conf] > +---------------------------- > |... > | :binarytest_size=512:... ----------------------------------+ > | | > v > $ gtags ... use new test using the first n=512 bytes > > After testing for a while, we can decide what we should do. > Thank you for your profitable consideration. > -- > Shigio YAMAGUCHI <[email protected]> > PGP fingerprint: D1CB 0B89 B346 4AB6 5663 C4B6 3CA5 BBB3 57BE DDA3 > > > _______________________________________________ > Bug-global mailing list > [email protected] > http://lists.gnu.org/mailman/listinfo/bug-global ---- Hideki IWAMOTO [email protected]
20091124-test_isbinary.patch
Description: Binary data
_______________________________________________ Bug-global mailing list [email protected] http://lists.gnu.org/mailman/listinfo/bug-global
