Yes. Especially if the data is fixed width and high cardinality then dictionary 
encoding is not going to be very useful.

May be for fixed width, we should create dictionary only if cardinality is 
below a certain threshold?

For variable width, whether cardinality is high or low, dictionary encoding 
will improve filter processing if column is used heavily in filters. So may be 
for variable width we should always create dictionary unless indicated 
otherwise in table config?
________________________________
From: kishore g <[email protected]>
Sent: Tuesday, February 11, 2020 2:10 PM
To: [email protected] <[email protected]>
Subject: Convert dictionary encoded into raw

As of today, we apply dictionary encoding for all columns by default. We
should probably move a hybrid approach where we decide the encoding based
on the data profile. For e.g. if the cardinality of the column is very high
(which is the case for metrics), dictionary encoding does not provide a lot
of value.

Thoughts?

Reply via email to