Ugh, I mean that the data stream doesn't need to be held in an array of
longs. The dictionary itself needs to stay decompressed. :)

.. Owen

On Mon, Feb 8, 2021 at 8:57 PM Owen O'Malley <[email protected]> wrote:

>
>
> On Mon, Feb 8, 2021 at 8:44 PM Gopal V <[email protected]> wrote:
>
>>
>> > Reason to stay sorted:
>> >
>> >     1. Searching for values in the dictionaries can use binary search.
>>
>> We did get some compression advantages from this in the past, but the
>> write-throughput is hurt by this one factor both on memory bloat and cpu.
>>
>> The alternative to sorting is to add an order vector to the dictionary
>> after it gets built, which doesn't help compression with small windows,
>> but can bring back binary search improvements if we ever want it in the
>> format.
>>
>>  > Sorting the dictionary means that we need to hold all of the values
>>
>> I think we still need to hold values unless we allow duplicate entries
>> in the dictionary after a partial flush, but can hold them in contiguous
>> memory instead of a linked structure.
>>
>
> We can hold them in a compressed stream rather than in arrays of longs.
>
> .. Owen
>
>
>>  > Reasons to stop sorting:
>>
>> +1
>>
>> Making it optional with a stream or flag to mark it would be enough
>> instead of forcing first insert slowness & that doesn't have to happen
>> immediately.
>>
>> Cheers,
>> Gopal
>>
>

Reply via email to