On Mon, Feb 8, 2021 at 8:44 PM Gopal V <[email protected]> wrote: > > > Reason to stay sorted: > > > > 1. Searching for values in the dictionaries can use binary search. > > We did get some compression advantages from this in the past, but the > write-throughput is hurt by this one factor both on memory bloat and cpu. > > The alternative to sorting is to add an order vector to the dictionary > after it gets built, which doesn't help compression with small windows, > but can bring back binary search improvements if we ever want it in the > format. > > > Sorting the dictionary means that we need to hold all of the values > > I think we still need to hold values unless we allow duplicate entries > in the dictionary after a partial flush, but can hold them in contiguous > memory instead of a linked structure. >
We can hold them in a compressed stream rather than in arrays of longs. .. Owen > > Reasons to stop sorting: > > +1 > > Making it optional with a stream or flag to mark it would be enough > instead of forcing first insert slowness & that doesn't have to happen > immediately. > > Cheers, > Gopal >
