[ 
https://issues.apache.org/jira/browse/PHOENIX-5595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16983788#comment-16983788
 ] 

Andrew Kyle Purtell commented on PHOENIX-5595:
----------------------------------------------

I think it would be fine to try for ZSTD as default if available, and fall back 
to FAST_DIFF and no compression if not. Or perhaps ROW_INDEX and GZ would still 
be less bad than other options as fallback, given GZ has a pure Java 
implementation in every JRE. We only take the decompression hit once at read, 
while with FAST_DIFF, at every scan. 

> Use ROW_INDEX_V1 block encoding and zSTD compression by default
> ---------------------------------------------------------------
>
>                 Key: PHOENIX-5595
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5595
>             Project: Phoenix
>          Issue Type: Wish
>            Reporter: Lars Hofhansl
>            Priority: Major
>
> Phoenix defaults to FAST_DIFF block encoding and no compression (not needed 
> with FAST_DIFF).
> I blogged about this extensively here: 
> http://hadoop-hbase.blogspot.com/2018/10/apache-hbase-and-apache-phoenix-more-on.html
> We should switch the default to block encoding ROW_INDEX_V1 and compression 
> zSTD for all newly created tables (including global indexes). Local indexes 
> can stay with FAST_DIFF, but perhaps for completeness we should just switch 
> everything.
> The only wrinkle is that FAST_DIFF also does compression (i.e. the diff 
> encoding), and ROW_INDEX_V1 actually increases the block size a little bit 
> since it keeps in a index of row keys so that it can do binary search inside 
> of an HFile block. Hence it needs to be paired with compression. Every test I 
> did suggests that zSTD is the best.
> The main wrinkle is that zSTD needs a Hadoop/HBase build with native zSTD 
> support compiled.
> I marked this as a Wish... Perhaps we can discuss here.
> What I do know is that FAST_DIFF has outgrown its usefulness, seeking into 
> FAST_DIFF is (naturally) slow since it would need to seek to that last know 
> fully stored key and then play all the diffs forward from there to the actual 
> row we want to seek to. This impacts GETs.
> zSTD also offers better compression and thus reduced IO even when paired with 
> ROW_INDEX_V1.
> [~apurtell] What we discussed a while back.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to