Lars Hofhansl created PHOENIX-5595:
--------------------------------------
Summary: Use ROW_INDEX_V1 block encoding and zSTD compression by
default
Key: PHOENIX-5595
URL: https://issues.apache.org/jira/browse/PHOENIX-5595
Project: Phoenix
Issue Type: Wish
Reporter: Lars Hofhansl
Phoenix defaults to FAST_DIFF block encoding and no compression (not needed
with FAST_DIFF).
I blogged about this extensively here:
http://hadoop-hbase.blogspot.com/2018/10/apache-hbase-and-apache-phoenix-more-on.html
We should switch the default to block encoding ROW_INDEX_V1 and compression
zSTD for all newly created tables (including global indexes). Local indexes can
stay with FAST_DIFF, but perhaps for completeness we should just switch
everything.
The only wrinkle is that FAST_DIFF also does compression (i.e. the diff
encoding), and ROW_INDEX_V1 actually increases the block size a little bit
since it keeps in a index of row keys so that it can do binary search inside of
an HFile block. Hence it needs to be paired with compression. Every test I did
suggests that zSTD is the best.
The main wrinkle is that zSTD needs a Hadoop/HBase build with native zSTD
support compiled.
I marked this as a Wish... Perhaps we can discuss here.
What I do know is that FAST_DIFF has outgrown its usefulness, seeking into
FAST_DIFF is (naturally) slow since it would need to seek to that last know
fully stored key and then play all the diffs forward from there to the actual
row we want to seek to. This impacts GETs.
zSTD also offers better compression and thus reduced IO even when paired with
ROW_INDEX_V1.
[~apurtell] What we discussed a while back.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)