I think I shares the formula for threshold long time ago. Let me dig it up and share.
Sent from my iPhone > On Feb 11, 2020, at 2:33 PM, kishore g <[email protected]> wrote: > > Any idea on what should be the threshold? > > Good point about the variable-length column, should we rely on a bloom > filter instead? > >> On Tue, Feb 11, 2020 at 2:14 PM Siddharth Teotia >> <[email protected]> wrote: >> >> Yes. Especially if the data is fixed width and high cardinality then >> dictionary encoding is not going to be very useful. >> >> May be for fixed width, we should create dictionary only if cardinality is >> below a certain threshold? >> >> For variable width, whether cardinality is high or low, dictionary >> encoding will improve filter processing if column is used heavily in >> filters. So may be for variable width we should always create dictionary >> unless indicated otherwise in table config? >> ________________________________ >> From: kishore g <[email protected]> >> Sent: Tuesday, February 11, 2020 2:10 PM >> To: [email protected] <[email protected]> >> Subject: Convert dictionary encoded into raw >> >> As of today, we apply dictionary encoding for all columns by default. We >> should probably move a hybrid approach where we decide the encoding based >> on the data profile. For e.g. if the cardinality of the column is very high >> (which is the case for metrics), dictionary encoding does not provide a lot >> of value. >> >> Thoughts? >> --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
