The really major win would be if we handle integer (especially boolean) matrices specially. Attacking the 4 byte cost of the index in a sparse vector, but attacking the 8 byte value would be even better. For sparse boolean matrices, the value can go away entirely.
All of these efforts will have the effect of making any downstream compression less valuable resulting in much less impressive gains. The exception is delta encoding of indexes which will probably make the downstream compressor more effective. On Sun, May 2, 2010 at 12:45 PM, Drew Farris <drew.far...@gmail.com> wrote: > Do anyone have any idea whether greater gains to be found by finely tuning > the base encoding vs. relying on some form of SequenceFile block > compression? (or do both approaches compliment each other nicely?) >