Hi Guus,

On Mon, Jun 27, 2016 at 10:56:14PM +0200, Guus Sliepen wrote:
> 
> I hope you will fix this description. I'd only keep the last paragraph,

Done.

> and then also explain what algorithm it actually uses to measure the
> entropy (Shannon's source coding theorem). This theorem is actually only
> usable in the context of an input of "independent and identically
> distributed random variables", it does not apply to every kind of input.
> In particular, it only looks at the histogram of byte values; if you
> feed it a file with totally predictable increasing byte values 0, 1, 2,
> etc., it will report an entropy of 8. Many compression algorithms,
> especially those for sound and images, look at differences between
> consecutive values or have other means to detect such predictable
> sequences. So make it clear that it just implements Shannon's H function
> and that it also only works on bytes.

I'd be happy if you would commit a fix to Git (its writable to any DD)
since you obviously know more about this than me.
 
> I also want to point out that this library is not thread-safe, something
> which could easily be fixed.

A patch would be reall welcome.

> It also gives the wrong answer when you
> have an input with more than 2^31-1 of the same bytes in the input, even
> though it pretends to handle inputs up to 2^63 in length.

I think this information should be in README.Debian.  What do you think?
 
> > Remark: The code of libdisorder appeared in two other targets of Debian
> > Med and to avoid code duplication this library is packaged separately.
> 
> Although normally I would applaud deduplication, I personally think this
> shouldn't get its own package. It looks like one of those things you'd
> find npm.

I think I'll stick to this separate library approach.

Thanks a lot for your comments

      Andreas.

-- 
http://fam-tille.de

Reply via email to