+1 for new suggestions. I agree with "Reasons to stop sorting". Bests, Dongjoon.
On Mon, Feb 8, 2021 at 12:58 PM Owen O'Malley <[email protected]> wrote: > Ugh, I mean that the data stream doesn't need to be held in an array of > longs. The dictionary itself needs to stay decompressed. :) > > .. Owen > > On Mon, Feb 8, 2021 at 8:57 PM Owen O'Malley <[email protected]> > wrote: > > > > > > > On Mon, Feb 8, 2021 at 8:44 PM Gopal V <[email protected]> wrote: > > > >> > >> > Reason to stay sorted: > >> > > >> > 1. Searching for values in the dictionaries can use binary search. > >> > >> We did get some compression advantages from this in the past, but the > >> write-throughput is hurt by this one factor both on memory bloat and > cpu. > >> > >> The alternative to sorting is to add an order vector to the dictionary > >> after it gets built, which doesn't help compression with small windows, > >> but can bring back binary search improvements if we ever want it in the > >> format. > >> > >> > Sorting the dictionary means that we need to hold all of the values > >> > >> I think we still need to hold values unless we allow duplicate entries > >> in the dictionary after a partial flush, but can hold them in contiguous > >> memory instead of a linked structure. > >> > > > > We can hold them in a compressed stream rather than in arrays of longs. > > > > .. Owen > > > > > >> > Reasons to stop sorting: > >> > >> +1 > >> > >> Making it optional with a stream or flag to mark it would be enough > >> instead of forcing first insert slowness & that doesn't have to happen > >> immediately. > >> > >> Cheers, > >> Gopal > >> > > >
