I think I shares the formula for threshold long time ago. Let me dig it up and 
share.

Sent from my iPhone

> On Feb 11, 2020, at 2:33 PM, kishore g <[email protected]> wrote:
> 
> Any idea on what should be the threshold?
> 
> Good point about the variable-length column, should we rely on a bloom
> filter instead?
> 
>> On Tue, Feb 11, 2020 at 2:14 PM Siddharth Teotia
>> <[email protected]> wrote:
>> 
>> Yes. Especially if the data is fixed width and high cardinality then
>> dictionary encoding is not going to be very useful.
>> 
>> May be for fixed width, we should create dictionary only if cardinality is
>> below a certain threshold?
>> 
>> For variable width, whether cardinality is high or low, dictionary
>> encoding will improve filter processing if column is used heavily in
>> filters. So may be for variable width we should always create dictionary
>> unless indicated otherwise in table config?
>> ________________________________
>> From: kishore g <[email protected]>
>> Sent: Tuesday, February 11, 2020 2:10 PM
>> To: [email protected] <[email protected]>
>> Subject: Convert dictionary encoded into raw
>> 
>> As of today, we apply dictionary encoding for all columns by default. We
>> should probably move a hybrid approach where we decide the encoding based
>> on the data profile. For e.g. if the cardinality of the column is very high
>> (which is the case for metrics), dictionary encoding does not provide a lot
>> of value.
>> 
>> Thoughts?
>> 

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to