Hi, I agree with option 2 but not new datatype use varchar(size). There are more optimizations we can do with varchar(size) datatype like 1. if the size is smaller (less than 8 bytes) then we can write in fixed length encoder instead of LV encode it can save a lot of space and memory. 2. If the size is less than 32000 then use current our string datatype. 3. If size is more than 32000 then encode using int as a length in LV format.
In spark dataframe support we can by default use string as datatype. Even if we take option 1 also carbon should internally has new datatype otherwise code will not be good as you need to check this property many places so ideally new datatype can lead to a new set of implementations and easier to code and maintain. -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
