Even in 1.4, though, GZip helps. -- Christopher L Tubbs II http://gravatar.com/ctubbsii
On Wed, Feb 19, 2014 at 11:40 AM, Keith Turner <[email protected]> wrote: > On Tue, Feb 18, 2014 at 10:14 PM, Mike Drob <[email protected]> wrote: > >> The column visibility is stored as a bytes on disk, derived from the entire >> visibility expression. In theory this may seem like a lot of space, but in >> practice it turns out to be fine for a couple of reasons. >> >> First, RFiles employ relative key encoding, so if the visibility is the >> same in two consecutive keys, then the second one is simply omitted. >> > > In 1.5, common prefixes in consecutive key fields may be compressed away. > In 1.4 the entire field had to match. > > https://issues.apache.org/jira/browse/ACCUMULO-790 > > >> Also, RFiles are use gz encoding by default. If you have a few similar >> (repeated) text strings to represent your visibilities, then they will >> compress very well. >> >> However, if you have lots of different visibilities, then you may not end >> up gaining much from the storage tricks we employ. >> >> Mike >> >> >> On Mon, Feb 17, 2014 at 10:30 PM, Sitaraman Vilayannur < >> [email protected]> wrote: >> >> > Hi, >> > How are the column visibility elements stored in Accumulo. Is there a >> > kind of compression that is used to save space or are all the elements >> for >> > each key value paired stored as is. >> > A pointer to the region of the code that i should look at for the >> > implementation will also be helpful. >> > Thanks >> > Sitaraman >> > >>
