Ugh, I mean that the data stream doesn't need to be held in an array of longs. The dictionary itself needs to stay decompressed. :)
.. Owen On Mon, Feb 8, 2021 at 8:57 PM Owen O'Malley <[email protected]> wrote: > > > On Mon, Feb 8, 2021 at 8:44 PM Gopal V <[email protected]> wrote: > >> >> > Reason to stay sorted: >> > >> > 1. Searching for values in the dictionaries can use binary search. >> >> We did get some compression advantages from this in the past, but the >> write-throughput is hurt by this one factor both on memory bloat and cpu. >> >> The alternative to sorting is to add an order vector to the dictionary >> after it gets built, which doesn't help compression with small windows, >> but can bring back binary search improvements if we ever want it in the >> format. >> >> > Sorting the dictionary means that we need to hold all of the values >> >> I think we still need to hold values unless we allow duplicate entries >> in the dictionary after a partial flush, but can hold them in contiguous >> memory instead of a linked structure. >> > > We can hold them in a compressed stream rather than in arrays of longs. > > .. Owen > > >> > Reasons to stop sorting: >> >> +1 >> >> Making it optional with a stream or flag to mark it would be enough >> instead of forcing first insert slowness & that doesn't have to happen >> immediately. >> >> Cheers, >> Gopal >> >
