Bryan Beaudreault created HBASE-28343:
-----------------------------------------

             Summary: Write codec class into hfile header/trailer
                 Key: HBASE-28343
                 URL: https://issues.apache.org/jira/browse/HBASE-28343
             Project: HBase
          Issue Type: Improvement
            Reporter: Bryan Beaudreault


We recently started playing around with the new bundled compression libraries 
as of 2.5.0. Specifically, we are experimenting with the different zstd codecs. 
The book says that aircompressor's zstd is not data compatible with hadoops, 
but doesn't say the same about zstd-jni.

In our experiments we ended up in a state where some hfiles were encoded with 
zstd-jni (zstd.ZstdCodec) while others were encoded with hadoop 
(ZStandardCodec). At this point the cluster became extremely unstable, with 
some files unable to be read because they encoded with a codec that didn't 
match the current runtime configration. Changing the runtime configuration 
caused the other files to not be readable.

I think this problem could be solved by writing the classname of the codec used 
into the hfile. This could be used as a hint so that a regionserver can read 
hfiles compressed with any compression codec that it supports.

[~apurtell] do you have any thoughts here since you brought us all of these 
great compression options?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to