[ https://issues.apache.org/jira/browse/PHOENIX-3582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15814561#comment-15814561 ]
Ankit Singhal commented on PHOENIX-3582: ---------------------------------------- For test #1, It could be possible if varchar length is less than 4 characters, then storing offset will be costly than storing the actual value. test #2, It could be because the amount of space the offset is taking in our encoding might be equivalent to key/value/prefix length (and 0 timestamp diff) stored in FastDiff encoding. And, also offset requires expensive data type as compared to storing just the length. [~mujtabachohan], can you share the absolute no's as well for both test#1 and test#2 if they are handy with you. And it is worth trying with additional compression like snappy or GZ to observe the effect. > No significant space saving with immutable encoded column with large number > of dense columns > -------------------------------------------------------------------------------------------- > > Key: PHOENIX-3582 > URL: https://issues.apache.org/jira/browse/PHOENIX-3582 > Project: Phoenix > Issue Type: Sub-task > Reporter: Mujtaba Chohan > Assignee: Samarth Jain > > Tested with 2 schemas both with 5K varchar columns. In test #1 columns were > named as column_1 ... column5000 whereas in test #2 columns were 10 byte > random alphanumeric. Each columns is filled 15 random bytes and all column > have values. > For test #1, Immutable encoded column uses ~4X *more* space than non-encoded > column. Fast Diff encoding really shines when column names are highly > compressible (column_1 ... column_5000) > For test #2, For worst case where column names are not compressible since > they are random 10 byte alpha numeric, immutable encoded column uses 25% less > space. > Data generation class is attached to > https://issues.apache.org/jira/browse/PHOENIX-3560. -- This message was sent by Atlassian JIRA (v6.3.4#6332)