[
https://issues.apache.org/jira/browse/PHOENIX-5595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16983788#comment-16983788
]
Andrew Kyle Purtell commented on PHOENIX-5595:
----------------------------------------------
I think it would be fine to try for ZSTD as default if available, and fall back
to FAST_DIFF and no compression if not. Or perhaps ROW_INDEX and GZ would still
be less bad than other options as fallback, given GZ has a pure Java
implementation in every JRE. We only take the decompression hit once at read,
while with FAST_DIFF, at every scan.
> Use ROW_INDEX_V1 block encoding and zSTD compression by default
> ---------------------------------------------------------------
>
> Key: PHOENIX-5595
> URL: https://issues.apache.org/jira/browse/PHOENIX-5595
> Project: Phoenix
> Issue Type: Wish
> Reporter: Lars Hofhansl
> Priority: Major
>
> Phoenix defaults to FAST_DIFF block encoding and no compression (not needed
> with FAST_DIFF).
> I blogged about this extensively here:
> http://hadoop-hbase.blogspot.com/2018/10/apache-hbase-and-apache-phoenix-more-on.html
> We should switch the default to block encoding ROW_INDEX_V1 and compression
> zSTD for all newly created tables (including global indexes). Local indexes
> can stay with FAST_DIFF, but perhaps for completeness we should just switch
> everything.
> The only wrinkle is that FAST_DIFF also does compression (i.e. the diff
> encoding), and ROW_INDEX_V1 actually increases the block size a little bit
> since it keeps in a index of row keys so that it can do binary search inside
> of an HFile block. Hence it needs to be paired with compression. Every test I
> did suggests that zSTD is the best.
> The main wrinkle is that zSTD needs a Hadoop/HBase build with native zSTD
> support compiled.
> I marked this as a Wish... Perhaps we can discuss here.
> What I do know is that FAST_DIFF has outgrown its usefulness, seeking into
> FAST_DIFF is (naturally) slow since it would need to seek to that last know
> fully stored key and then play all the diffs forward from there to the actual
> row we want to seek to. This impacts GETs.
> zSTD also offers better compression and thus reduced IO even when paired with
> ROW_INDEX_V1.
> [~apurtell] What we discussed a while back.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)