Any idea on what should be the threshold?

Good point about the variable-length column, should we rely on a bloom
filter instead?

On Tue, Feb 11, 2020 at 2:14 PM Siddharth Teotia
<[email protected]> wrote:

> Yes. Especially if the data is fixed width and high cardinality then
> dictionary encoding is not going to be very useful.
>
> May be for fixed width, we should create dictionary only if cardinality is
> below a certain threshold?
>
> For variable width, whether cardinality is high or low, dictionary
> encoding will improve filter processing if column is used heavily in
> filters. So may be for variable width we should always create dictionary
> unless indicated otherwise in table config?
> ________________________________
> From: kishore g <[email protected]>
> Sent: Tuesday, February 11, 2020 2:10 PM
> To: [email protected] <[email protected]>
> Subject: Convert dictionary encoded into raw
>
> As of today, we apply dictionary encoding for all columns by default. We
> should probably move a hybrid approach where we decide the encoding based
> on the data profile. For e.g. if the cardinality of the column is very high
> (which is the case for metrics), dictionary encoding does not provide a lot
> of value.
>
> Thoughts?
>

Reply via email to