[
https://issues.apache.org/jira/browse/PHOENIX-3582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15814561#comment-15814561
]
Ankit Singhal commented on PHOENIX-3582:
----------------------------------------
For test #1,
It could be possible if varchar length is less than 4 characters, then storing
offset will be costly than storing the actual value.
test #2,
It could be because the amount of space the offset is taking in our encoding
might be equivalent to key/value/prefix length (and 0 timestamp diff) stored in
FastDiff encoding. And, also offset requires expensive data type as compared to
storing just the length.
[~mujtabachohan], can you share the absolute no's as well for both test#1 and
test#2 if they are handy with you. And it is worth trying with additional
compression like snappy or GZ to observe the effect.
> No significant space saving with immutable encoded column with large number
> of dense columns
> --------------------------------------------------------------------------------------------
>
> Key: PHOENIX-3582
> URL: https://issues.apache.org/jira/browse/PHOENIX-3582
> Project: Phoenix
> Issue Type: Sub-task
> Reporter: Mujtaba Chohan
> Assignee: Samarth Jain
>
> Tested with 2 schemas both with 5K varchar columns. In test #1 columns were
> named as column_1 ... column5000 whereas in test #2 columns were 10 byte
> random alphanumeric. Each columns is filled 15 random bytes and all column
> have values.
> For test #1, Immutable encoded column uses ~4X *more* space than non-encoded
> column. Fast Diff encoding really shines when column names are highly
> compressible (column_1 ... column_5000)
> For test #2, For worst case where column names are not compressible since
> they are random 10 byte alpha numeric, immutable encoded column uses 25% less
> space.
> Data generation class is attached to
> https://issues.apache.org/jira/browse/PHOENIX-3560.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)