Yes. Especially if the data is fixed width and high cardinality then dictionary encoding is not going to be very useful.
May be for fixed width, we should create dictionary only if cardinality is below a certain threshold? For variable width, whether cardinality is high or low, dictionary encoding will improve filter processing if column is used heavily in filters. So may be for variable width we should always create dictionary unless indicated otherwise in table config? ________________________________ From: kishore g <[email protected]> Sent: Tuesday, February 11, 2020 2:10 PM To: [email protected] <[email protected]> Subject: Convert dictionary encoded into raw As of today, we apply dictionary encoding for all columns by default. We should probably move a hybrid approach where we decide the encoding based on the data profile. For e.g. if the cardinality of the column is very high (which is the case for metrics), dictionary encoding does not provide a lot of value. Thoughts?
