+1 for new suggestions.
I agree with "Reasons to stop sorting".

Bests,
Dongjoon.


On Mon, Feb 8, 2021 at 12:58 PM Owen O'Malley <[email protected]>
wrote:

> Ugh, I mean that the data stream doesn't need to be held in an array of
> longs. The dictionary itself needs to stay decompressed. :)
>
> .. Owen
>
> On Mon, Feb 8, 2021 at 8:57 PM Owen O'Malley <[email protected]>
> wrote:
>
> >
> >
> > On Mon, Feb 8, 2021 at 8:44 PM Gopal V <[email protected]> wrote:
> >
> >>
> >> > Reason to stay sorted:
> >> >
> >> >     1. Searching for values in the dictionaries can use binary search.
> >>
> >> We did get some compression advantages from this in the past, but the
> >> write-throughput is hurt by this one factor both on memory bloat and
> cpu.
> >>
> >> The alternative to sorting is to add an order vector to the dictionary
> >> after it gets built, which doesn't help compression with small windows,
> >> but can bring back binary search improvements if we ever want it in the
> >> format.
> >>
> >>  > Sorting the dictionary means that we need to hold all of the values
> >>
> >> I think we still need to hold values unless we allow duplicate entries
> >> in the dictionary after a partial flush, but can hold them in contiguous
> >> memory instead of a linked structure.
> >>
> >
> > We can hold them in a compressed stream rather than in arrays of longs.
> >
> > .. Owen
> >
> >
> >>  > Reasons to stop sorting:
> >>
> >> +1
> >>
> >> Making it optional with a stream or flag to mark it would be enough
> >> instead of forcing first insert slowness & that doesn't have to happen
> >> immediately.
> >>
> >> Cheers,
> >> Gopal
> >>
> >
>

Reply via email to