Reason to stay sorted:

    1. Searching for values in the dictionaries can use binary search.

We did get some compression advantages from this in the past, but the write-throughput is hurt by this one factor both on memory bloat and cpu.

The alternative to sorting is to add an order vector to the dictionary after it gets built, which doesn't help compression with small windows, but can bring back binary search improvements if we ever want it in the format.

> Sorting the dictionary means that we need to hold all of the values

I think we still need to hold values unless we allow duplicate entries in the dictionary after a partial flush, but can hold them in contiguous memory instead of a linked structure.

> Reasons to stop sorting:

+1

Making it optional with a stream or flag to mark it would be enough instead of forcing first insert slowness & that doesn't have to happen immediately.

Cheers,
Gopal

Reply via email to