Compression is likely to help with things like binary matrices or matrices
of small counts.  Using a binary or trinary random projection will preserve
this compressibility for one step, but as soon as we are into the first QR
projection, this property will be lost, I expect.

This is the long way of saying that I agree.

On Sat, Sep 3, 2011 at 2:41 AM, Dmitriy Lyubimov <[email protected]> wrote:

> Per above.
>
> I noticed i do ask for compression of results and intermediate data.
> (more of a programming reflex really than any motivated decision).
>
> But for data such as vectors, assuming sparse vectors are used where
> appropriate, compression is not going to win much.
>
> On the other hand, if native libraries are enabled, default GZIP codec
> does not cost much compared to computations etiher.
>
> And a third option, maybe we shouldn't put any defaults in at all and
> leave it for -D options. Which i see as somewhat a problem since
> hadoop somewhat tries to encapsulate those properties in static
> methods of classes such as  FileOutputFormat, which may imply that the
> property names are not meant to be part of any user contract and just
> implementation details of a concrete file format.
>
> I am leaning towards enforcing no compression by default.
>

Reply via email to