Hi Guus, On Mon, Jun 27, 2016 at 10:56:14PM +0200, Guus Sliepen wrote: > > I hope you will fix this description. I'd only keep the last paragraph,
Done. > and then also explain what algorithm it actually uses to measure the > entropy (Shannon's source coding theorem). This theorem is actually only > usable in the context of an input of "independent and identically > distributed random variables", it does not apply to every kind of input. > In particular, it only looks at the histogram of byte values; if you > feed it a file with totally predictable increasing byte values 0, 1, 2, > etc., it will report an entropy of 8. Many compression algorithms, > especially those for sound and images, look at differences between > consecutive values or have other means to detect such predictable > sequences. So make it clear that it just implements Shannon's H function > and that it also only works on bytes. I'd be happy if you would commit a fix to Git (its writable to any DD) since you obviously know more about this than me. > I also want to point out that this library is not thread-safe, something > which could easily be fixed. A patch would be reall welcome. > It also gives the wrong answer when you > have an input with more than 2^31-1 of the same bytes in the input, even > though it pretends to handle inputs up to 2^63 in length. I think this information should be in README.Debian. What do you think? > > Remark: The code of libdisorder appeared in two other targets of Debian > > Med and to avoid code duplication this library is packaged separately. > > Although normally I would applaud deduplication, I personally think this > shouldn't get its own package. It looks like one of those things you'd > find npm. I think I'll stick to this separate library approach. Thanks a lot for your comments Andreas. -- http://fam-tille.de