Re: [jira] [Commented] (MAHOUT-1490) Data frame R-like bindings

Ted Dunning Mon, 19 May 2014 11:17:34 -0700

On Mon, May 19, 2014 at 11:08 AM, Dmitriy Lyubimov (JIRA)
<[email protected]>wrote:


> [~avati] do you think you could perhaps explain (or reference principled
> foundation publication) of the algorithm that is happening here?


One of the most commonly effective compression techniques is dictionary +
run-length.  For instance, the binary matrices that much of our software
uses would have massive compression using this.

For instance, a binary vector with 1million elements with 0.01% sparsity
would compress to about less than 200 bytes using these techniques and a
very naive implementation.  Our current sparse representation requires
about 1200 bytes.

Re: [jira] [Commented] (MAHOUT-1490) Data frame R-like bindings

Reply via email to