On Mon, Feb 8, 2021 at 8:44 PM Gopal V <[email protected]> wrote:

>
> > Reason to stay sorted:
> >
> >     1. Searching for values in the dictionaries can use binary search.
>
> We did get some compression advantages from this in the past, but the
> write-throughput is hurt by this one factor both on memory bloat and cpu.
>
> The alternative to sorting is to add an order vector to the dictionary
> after it gets built, which doesn't help compression with small windows,
> but can bring back binary search improvements if we ever want it in the
> format.
>
>  > Sorting the dictionary means that we need to hold all of the values
>
> I think we still need to hold values unless we allow duplicate entries
> in the dictionary after a partial flush, but can hold them in contiguous
> memory instead of a linked structure.
>

We can hold them in a compressed stream rather than in arrays of longs.

.. Owen


>  > Reasons to stop sorting:
>
> +1
>
> Making it optional with a stream or flag to mark it would be enough
> instead of forcing first insert slowness & that doesn't have to happen
> immediately.
>
> Cheers,
> Gopal
>

Reply via email to