Dear Dev team,

I have a question several days ago about RLE and DELTA encoding in Carbon. Thank you for pointing me the source code of the implementation.

I have read through the code, and have the following understanding. Could you please double confirm whether they are correct? Thanks!

1. RLE encoding only applies to columns with Encoding.DICTIONARY enabled and has cardinality less than the parameter CarbonCommonConstants.HIGH_CARDINALITY_VALUE.

I saw that the RLE encoding is applied to data in function /BlockIndexerStorageForInt.compressDataMyOwnWay, /and is controlled by /aggKeyBlock/, of which the value is set by /arrangeUniqueBlockType/.

If my understanding is correct, could you please share some reasons you design the logic like this?

2. DELTA encoding is implemented in /ValueCompressionUtil.getCompressedValues. /It doesn't do a sequential DELTA encoding, e.g., for a list of numbers a,b,c..., encode them as a, b-a, c-b...//Instead, it does a max-delta encoding. e.g., for a,b,c..., assume the max value is M, encode them as M-a, M-b, M-c.

Could you please also share the thought why you choose to use this encoding?

Thanks!

Regards,

Hao Jiang


Reply via email to