Lars Hofhansl created PHOENIX-5595:
--------------------------------------

             Summary: Use ROW_INDEX_V1 block encoding and zSTD compression by 
default
                 Key: PHOENIX-5595
                 URL: https://issues.apache.org/jira/browse/PHOENIX-5595
             Project: Phoenix
          Issue Type: Wish
            Reporter: Lars Hofhansl


Phoenix defaults to FAST_DIFF block encoding and no compression (not needed 
with FAST_DIFF).

I blogged about this extensively here: 
http://hadoop-hbase.blogspot.com/2018/10/apache-hbase-and-apache-phoenix-more-on.html

We should switch the default to block encoding ROW_INDEX_V1 and compression 
zSTD for all newly created tables (including global indexes). Local indexes can 
stay with FAST_DIFF, but perhaps for completeness we should just switch 
everything.

The only wrinkle is that FAST_DIFF also does compression (i.e. the diff 
encoding), and ROW_INDEX_V1 actually increases the block size a little bit 
since it keeps in a index of row keys so that it can do binary search inside of 
an HFile block. Hence it needs to be paired with compression. Every test I did 
suggests that zSTD is the best.
The main wrinkle is that zSTD needs a Hadoop/HBase build with native zSTD 
support compiled.

I marked this as a Wish... Perhaps we can discuss here.

What I do know is that FAST_DIFF has outgrown its usefulness, seeking into 
FAST_DIFF is (naturally) slow since it would need to seek to that last know 
fully stored key and then play all the diffs forward from there to the actual 
row we want to seek to. This impacts GETs.
zSTD also offers better compression and thus reduced IO even when paired with 
ROW_INDEX_V1.

[~apurtell] What we discussed a while back.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to